SPARQL Examples
A machine-readable SPARQL endpoint allows querying the RDF descriptions of the datasets in the catalog using the SPARQL query language. It provides developers and professionals with great power and flexibility when composing queries and building applications.
Stored information
All the information that can be queried is loaded in a semantic database. In this type of database the information is stored in graphs.
The semantic database of datos.gob.es contains two graphs:
- http://datos.gob.es/catalogo: this graph contains the entire Data Catalog of datos.gob.es
- http://datos.gob.es/nti: contains the URIs corresponding to the primary sector taxonomy and the geographic coverage identifiers defined in Annexes IV and V of the Technical Interoperability Standard for Reuse of Information Resources (NTI)
How to use it
SPARQL query editor
To make requests you can use the portal's SPARQL query editor. These kinds of forms assist the developer by providing a text area with syntax highlighting, autocomplete functionality and the option to execute queries, among other features.
HTTP request
SPARQL queries can be made via GET
requests, which return data according to the parameters specified in the call.
The base address of the Catalog's SPARQL endpoint is:
http://datos.gob.es/virtuoso/sparql
You must add the query
parameter to this address followed by the query you want to execute.
For example, if we want to run the following query that retrieves distinct types present in the graph:
SELECT DISTINCT ?tipo
WHERE {
?x a ?tipo.
}
LIMIT 10
-
We must URL-encode it by replacing spaces with
+
, in this way:SELECT+DISTINCT+?tipo+WHERE+{?x+a+?tipo}+LIMIT+10
-
Once encoded and knowing the endpoint address, it would be executed as follows:
https://datos.gob.es/virtuoso/sparql?query=SELECT+DISTINCT+?tipo+WHERE+{?x+a+?tipo}+LIMIT+10
If you access the SPARQL endpoint base address without adding the query
parameter with the query, it will return a page not found error.
The default response format is HTML, but you can request other formats via the format
parameter — for example to obtain CSV (comma-separated values):
https://datos.gob.es/virtuoso/sparql?query=SELECT+DISTINCT+?tipo+WHERE+{?x+a+?tipo}+LIMIT+10&format=text/csv
This parameter can accept the following values:
Value | Format |
---|---|
text/html | HTML |
application/vnd.ms-excel | Spreadsheet |
text/tab-separated-values | TSV |
application/sparql-results+xml | XML |
application/sparql-results+json | JSON |
application/javascript | Javascript |
text/turtle | Turtle |
application/rdf+xml | RDF/XML |
text/plain | N-Triples |
text/csv | CSV |
Examples
Query the graphs available at the SPARQL endpoint
If you do not specify any graph when running the query, information from all available graphs will be returned.
You can list the available graphs with this query:
SELECT DISTINCT ?uri
WHERE {
GRAPH ?uri {
?s a ?t
}
}
If we want to know which graph a piece of information belongs to, we can do it like this where variable ?g
will show which graph each ?x
belongs to:
SELECT DISTINCT ?g ?tipo
WHERE {
GRAPH ?g {
?x a ?tipo.
}
}
LIMIT 100
We can get results from a single graph as follows:
SELECT DISTINCT ?tipo
WHERE {
GRAPH <http://datos.gob.es/catalogo> {
?x a ?tipo.
}
}
LIMIT 100
Now we will only see the ?x
that are in the specified graph.
Retrieve all classes present at the SPARQL endpoint
This query is very useful because it shows all types of information that are stored.
Also, to avoid getting unhelpful noise, we will specify our graphs.
First we specify the Catalog graph:
SELECT DISTINCT ?tipo
WHERE {
GRAPH <http://datos.gob.es/catalogo> {
?x a ?tipo.
}
}
Now we'll use both graphs at once using VALUES
SELECT DISTINCT ?tipo
WHERE {
GRAPH ?grafo {
?x a ?tipo.
}
VALUES ?grafo {
<http://datos.gob.es/catalogo> <http://datos.gob.es/nti>
}
}
The final result will look similar to the following:
Retrieve all datasets from the Catalog
Now that we know the types, let's request all datasets, which correspond to this URI: http://www.w3.org/ns/dcat#Dataset
PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT DISTINCT ?dataset
WHERE {
?dataset a dcat:Dataset .
}
The result is a list of URLs for all datasets.
Retrieve all data services from the Catalog
Now that we know the types, let's request all data services, which correspond to this URI: http://www.w3.org/ns/dcat#DataService
PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT DISTINCT ?dataService
WHERE {
?dataService a dcat:DataService .
}
The result is a list of URLs for all data services.
Get the number of main entities and publishers per Graph
A SPARQL query to quickly overview the content lists the number of resources per type — dcat:Dataset
(datasets), dcat:DataService
(data services), dcat:Distribution
(distributions) and dct:publisher
(resource publishers) — for each graph (g
) that contains content.
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT ?g
(COUNT(DISTINCT ?dataset) AS ?numDatasets)
(COUNT(DISTINCT ?service) AS ?numDataServices)
(COUNT(DISTINCT ?distribution) AS ?numDistributions)
(COUNT(DISTINCT ?publisher) AS ?numPublishers)
WHERE {
GRAPH ?g {
?dataset a dcat:Dataset .
OPTIONAL { ?dataset dct:publisher ?publisher. }
OPTIONAL { ?dataset dcat:distribution ?distribution. }
OPTIONAL { ?service a dcat:DataService. }
}
}
GROUP BY ?g
ORDER BY DESC(?numDatasets)
LIMIT 100
The result would be:
g | numDatasets | numDataServices | numDistributions | numPublishers |
---|---|---|---|---|
http://datos.gob.es/catalogo | 97573 | 2 | 545171 | 256 |
Retrieve all properties that datasets have
We want to get more information about datasets, but we only know their URIs; let's ask for all their properties:
PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT DISTINCT ?propiedad
WHERE {
?dataset a dcat:Dataset .
?dataset ?propiedad ?valor .
}
These are some of the properties they have; note that by defining the prefix PREFIX dcat: <http://www.w3.org/ns/dcat#>
, the response shows dcat:keyword
instead of the full URI http://www.w3.org/ns/dcat#keyword
property |
---|
http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
http://purl.org/dc/terms/modified |
http://purl.org/dc/terms/temporal |
dcat:keyword |
Remaining properties... |
Retrieve all properties that data services have
Similarly, we can retrieve all properties of data services:
PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT DISTINCT ?propiedad
WHERE {
?dataService a dcat:DataService .
?dataService ?propiedad ?valor .
}
These are some of the properties they have:
Retrieve the names of organizations that publish datasets
We will retrieve the URI and the name of organizations (foaf:name
) that publish datasets. Since names may exist in different languages, we filter specifically for Spanish names (es
), but any other language tag could be used.
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?publicador ?nombre
WHERE {
?x a dcat:Dataset ;
dct:publisher ?publicador .
?publicador foaf:name ?nombre.
FILTER(LANGMATCHES(LANG(?nombre), "es"))
}
The result is a list of all publishing organizations and their names:
publisher | name |
---|---|
http://datos.gob.es/recurso/sector-publico/org/Organismo/E05068001 | "Ministerio para la Transición Ecológica y el Reto Demográfico" |
http://datos.gob.es/recurso/sector-publico/org/Organismo/EA0010587 | "Instituto Nacional de Estadística" |
http://datos.gob.es/recurso/sector-publico/org/Organismo/E05068901 | "Ministerio de Asuntos Económicos y Transformación Digital" |
http://datos.gob.es/recurso/sector-publico/org/Organismo/EA0019768 | "Biblioteca Nacional de España" |
Remaining organization URIs... | Remaining organization names... |
Retrieve properties of organizations that publish datasets
With URIs we don't know the organization names; let's ask for the properties of those URIs.
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT ?propiedad
WHERE {
?x a dcat:Dataset .
?x dct:publisher ?publicador.
?publicador ?propiedad ?valor.
}
These are all properties of publishers:
property |
---|
http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
dct:identifier |
http://xmlns.com/foaf/0.1/name |
Retrieve the names of the ten organizations with the most published datasets and show their counts
To perform this query we need to group results, order them and limit the total to 10.
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?nombre
(COUNT(?x) AS ?num)
WHERE {
{ ?x a dcat:Dataset } .
?x dct:publisher ?publicador .
?publicador foaf:name ?nombre
.
}
GROUP BY ?nombre
ORDER BY DESC(?num)
LIMIT 10
The result will be:
name | num |
---|---|
"Instituto Nacional de Estadística" | 21629 |
"Instituto Canario de Estadística" | 21326 |
"Comunidad Autónoma del País Vasco" | 8707 |
"Parlamento de Canarias" | 4355 |
Remaining organization names... | Remaining published datasets... |
Retrieve the names of the ten organizations with the most published data services and show their counts
We can apply the same example as above, but this time for data services (dcat:DataService
).
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?nombre (COUNT(?x) AS ?num)
WHERE {
{ ?x a dcat:DataService } .
?x dct:publisher ?publicador .
?publicador foaf:name ?nombre .
}
GROUP BY ?nombre
ORDER BY DESC(?num)
LIMIT 10
The result will be:
name | num |
---|---|
"Ministerio para la Transición Ecológica y el Reto Demográfico" | 1 |
"Instituto Nacional de Estadística" | 1 |
Remaining organization names... | Remaining published services... |
Retrieve the names of the ten organizations with the most total published resources and show a breakdown
Combining the previous examples, we can return the number of services (dcat:DataService
) and datasets (dcat:Dataset
) per publishing organization and the total number of published entities. To do this we sum both totals and present the results ordered from highest to lowest number of published resources. Additionally, the FILTER(BOUND(?dataset) || BOUND(?service))
excludes empty organizations from the ranking, showing only those that actually contributed datasets or services to the catalog.
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?nombre
(COUNT(DISTINCT ?dataset) AS ?numDatasets)
(COUNT(DISTINCT ?service) AS ?numDataServices)
((COUNT(DISTINCT ?dataset) + COUNT(DISTINCT ?service)) AS ?numTotal)
WHERE {
?publicador foaf:name ?nombre .
FILTER(LANGMATCHES(LANG(?nombre), "es"))
OPTIONAL { ?dataset a dcat:Dataset ; dct:publisher ?publicador . }
OPTIONAL { ?service a dcat:DataService ; dct:publisher ?publicador . }
FILTER(BOUND(?dataset) || BOUND(?service))
}
GROUP BY ?nombre
ORDER BY DESC(?numTotal)
LIMIT 10
The result will be:
name | numDatasets | numDataServices | numTotal |
---|---|---|---|
"Instituto Nacional de Estadística" | 21629 | 1 | 21630 |
"Instituto Canario de Estadística" | 21326 | 0 | 21326 |
"Comunidad Autónoma del País Vasco" | 8707 | 0 | 8707 |
"Parlamento de Canarias" | 4355 | 0 | 4355 |
Remaining organization names... | Remaining published datasets... | Remaining published services... | Remaining published totals... |