SPARQL Examples

A machine-readable SPARQL endpoint allows querying the RDF descriptions of the datasets in the catalog using the SPARQL query language. It provides developers and professionals with great power and flexibility when composing queries and building applications.

Stored information

All the information that can be queried is loaded in a semantic database. In this type of database the information is stored in graphs.

The semantic database of datos.gob.es contains two graphs:

http://datos.gob.es/catalogo: this graph contains the entire Data Catalog of datos.gob.es
http://datos.gob.es/nti: contains the URIs corresponding to the primary sector taxonomy and the geographic coverage identifiers defined in Annexes IV and V of the Technical Interoperability Standard for Reuse of Information Resources (NTI)

How to use it

SPARQL query editor

To make requests you can use the portal's SPARQL query editor. These kinds of forms assist the developer by providing a text area with syntax highlighting, autocomplete functionality and the option to execute queries, among other features.

HTTP request

SPARQL queries can be made via GET requests, which return data according to the parameters specified in the call.

The base address of the Catalog's SPARQL endpoint is:

http://datos.gob.es/virtuoso/sparql

You must add the query parameter to this address followed by the query you want to execute.

For example, if we want to run the following query that retrieves distinct types present in the graph:

SELECT DISTINCT ?tipo
WHERE { 
  ?x a ?tipo.
}
LIMIT 10

We must URL-encode it by replacing spaces with +, in this way:
```
SELECT+DISTINCT+?tipo+WHERE+{?x+a+?tipo}+LIMIT+10
```

Once encoded and knowing the endpoint address, it would be executed as follows:

https://datos.gob.es/virtuoso/sparql?query=SELECT+DISTINCT+?tipo+WHERE+{?x+a+?tipo}+LIMIT+10

If you access the SPARQL endpoint base address without adding the query parameter with the query, it will return a page not found error.

The default response format is HTML, but you can request other formats via the format parameter — for example to obtain CSV (comma-separated values):

https://datos.gob.es/virtuoso/sparql?query=SELECT+DISTINCT+?tipo+WHERE+{?x+a+?tipo}+LIMIT+10&format=text/csv

This parameter can accept the following values:

Value	Format
text/html	HTML
application/vnd.ms-excel	Spreadsheet
text/tab-separated-values	TSV
application/sparql-results+xml	XML
application/sparql-results+json	JSON
application/javascript	Javascript
text/turtle	Turtle
application/rdf+xml	RDF/XML
text/plain	N-Triples
text/csv	CSV

Examples

Query the graphs available at the SPARQL endpoint

If you do not specify any graph when running the query, information from all available graphs will be returned.

You can list the available graphs with this query:

SELECT DISTINCT ?uri
WHERE { 
  GRAPH ?uri { 
    ?s a ?t 
  } 
}

If we want to know which graph a piece of information belongs to, we can do it like this where variable ?g will show which graph each ?x belongs to:

SELECT DISTINCT ?g ?tipo
WHERE { 
  GRAPH ?g { 
    ?x a ?tipo.
  } 
} 
LIMIT 100

We can get results from a single graph as follows:

SELECT DISTINCT ?tipo
WHERE { 
	GRAPH <http://datos.gob.es/catalogo> { 
		?x a ?tipo. 
  } 
} 
LIMIT 100

Now we will only see the ?x that are in the specified graph.

Retrieve all classes present at the SPARQL endpoint

This query is very useful because it shows all types of information that are stored.

Also, to avoid getting unhelpful noise, we will specify our graphs.

First we specify the Catalog graph:

SELECT DISTINCT ?tipo
WHERE { 
 GRAPH <http://datos.gob.es/catalogo> {
  ?x a ?tipo.
 } 
}

Now we'll use both graphs at once using VALUES

SELECT DISTINCT ?tipo
WHERE { 
  GRAPH ?grafo { 
    ?x a ?tipo.
    }
   VALUES ?grafo {
     <http://datos.gob.es/catalogo> <http://datos.gob.es/nti>
   } 
}

The final result will look similar to the following:

tipo
http://spdx.org/rdf/terms#Checksum
http://www.w3.org/ns/dcat#Distribution
http://xmlns.com/foaf/0.1/Document
http://purl.org/dc/terms/PeriodOfTime
Remaining types...

Retrieve all datasets from the Catalog

Now that we know the types, let's request all datasets, which correspond to this URI: http://www.w3.org/ns/dcat#Dataset

PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT DISTINCT ?dataset
WHERE {
  ?dataset a dcat:Dataset .
}

The result is a list of URLs for all datasets.

Retrieve all data services from the Catalog

Now that we know the types, let's request all data services, which correspond to this URI: http://www.w3.org/ns/dcat#DataService

PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT DISTINCT ?dataService
WHERE {
  ?dataService a dcat:DataService .
}

The result is a list of URLs for all data services.

Get the number of main entities and publishers per Graph

A SPARQL query to quickly overview the content lists the number of resources per type — dcat:Dataset (datasets), dcat:DataService (data services), dcat:Distribution (distributions) and dct:publisher (resource publishers) — for each graph (g) that contains content.

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct:  <http://purl.org/dc/terms/>
SELECT ?g
       (COUNT(DISTINCT ?dataset) AS ?numDatasets)
       (COUNT(DISTINCT ?service) AS ?numDataServices)
       (COUNT(DISTINCT ?distribution) AS ?numDistributions)
       (COUNT(DISTINCT ?publisher) AS ?numPublishers)
WHERE {
  GRAPH ?g {
    ?dataset a dcat:Dataset .
    OPTIONAL { ?dataset dct:publisher ?publisher. }
    OPTIONAL { ?dataset dcat:distribution ?distribution. }
    OPTIONAL { ?service a dcat:DataService. }
  }
}
GROUP BY ?g
ORDER BY DESC(?numDatasets)
LIMIT 100

The result would be:

g	numDatasets	numDataServices	numDistributions	numPublishers
http://datos.gob.es/catalogo	97573	2	545171	256

Retrieve all properties that datasets have

We want to get more information about datasets, but we only know their URIs; let's ask for all their properties:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT DISTINCT ?propiedad
WHERE {
  ?dataset a dcat:Dataset .
  ?dataset ?propiedad ?valor . 
}

These are some of the properties they have; note that by defining the prefix PREFIX dcat: <http://www.w3.org/ns/dcat#>, the response shows dcat:keyword instead of the full URI http://www.w3.org/ns/dcat#keyword

property
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://purl.org/dc/terms/modified
http://purl.org/dc/terms/temporal
dcat:keyword
Remaining properties...

Retrieve all properties that data services have

Similarly, we can retrieve all properties of data services:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT DISTINCT ?propiedad
WHERE {
  ?dataService a dcat:DataService .
  ?dataService ?propiedad ?valor . 
}

These are some of the properties they have:

property
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://purl.org/dc/terms/license
http://purl.org/dc/terms/accessRights
http://data.europa.eu/r5r/applicableLegislation
Remaining properties...

Retrieve the names of organizations that publish datasets

We will retrieve the URI and the name of organizations (foaf:name) that publish datasets. Since names may exist in different languages, we filter specifically for Spanish names (es), but any other language tag could be used.

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?publicador ?nombre
WHERE {
  ?x a dcat:Dataset ;
     dct:publisher ?publicador .
  ?publicador foaf:name ?nombre.
  FILTER(LANGMATCHES(LANG(?nombre), "es"))
}

The result is a list of all publishing organizations and their names:

publisher	name
http://datos.gob.es/recurso/sector-publico/org/Organismo/E05068001	"Ministerio para la Transición Ecológica y el Reto Demográfico"
http://datos.gob.es/recurso/sector-publico/org/Organismo/EA0010587	"Instituto Nacional de Estadística"
http://datos.gob.es/recurso/sector-publico/org/Organismo/E05068901	"Ministerio de Asuntos Económicos y Transformación Digital"
http://datos.gob.es/recurso/sector-publico/org/Organismo/EA0019768	"Biblioteca Nacional de España"
Remaining organization URIs...	Remaining organization names...

Retrieve properties of organizations that publish datasets

With URIs we don't know the organization names; let's ask for the properties of those URIs.

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT ?propiedad
WHERE {
  ?x a dcat:Dataset .
  ?x dct:publisher ?publicador. 
  ?publicador ?propiedad ?valor. 
}

These are all properties of publishers:

property
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
dct:identifier
http://xmlns.com/foaf/0.1/name

Retrieve the names of the ten organizations with the most published datasets and show their counts

To perform this query we need to group results, order them and limit the total to 10.

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?nombre
 (COUNT(?x) AS ?num)
WHERE {
  { ?x a dcat:Dataset } .
  ?x dct:publisher ?publicador .
  ?publicador foaf:name ?nombre
 .
}
GROUP BY ?nombre
 
ORDER BY DESC(?num)
LIMIT 10

The result will be:

name	num
"Instituto Nacional de Estadística"	21629
"Instituto Canario de Estadística"	21326
"Comunidad Autónoma del País Vasco"	8707
"Parlamento de Canarias"	4355
Remaining organization names...	Remaining published datasets...

Retrieve the names of the ten organizations with the most published data services and show their counts

We can apply the same example as above, but this time for data services (dcat:DataService).

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?nombre (COUNT(?x) AS ?num)
WHERE {
  { ?x a dcat:DataService } .
  ?x dct:publisher ?publicador .
  ?publicador foaf:name ?nombre .
}
GROUP BY ?nombre
ORDER BY DESC(?num)
LIMIT 10

The result will be:

name	num
"Ministerio para la Transición Ecológica y el Reto Demográfico"	1
"Instituto Nacional de Estadística"	1
Remaining organization names...	Remaining published services...

Retrieve the names of the ten organizations with the most total published resources and show a breakdown

Combining the previous examples, we can return the number of services (dcat:DataService) and datasets (dcat:Dataset) per publishing organization and the total number of published entities. To do this we sum both totals and present the results ordered from highest to lowest number of published resources. Additionally, the FILTER(BOUND(?dataset) || BOUND(?service)) excludes empty organizations from the ranking, showing only those that actually contributed datasets or services to the catalog.

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?nombre 
       (COUNT(DISTINCT ?dataset) AS ?numDatasets)
       (COUNT(DISTINCT ?service) AS ?numDataServices)
       ((COUNT(DISTINCT ?dataset) + COUNT(DISTINCT ?service)) AS ?numTotal)
WHERE {
  ?publicador foaf:name ?nombre .
 
  FILTER(LANGMATCHES(LANG(?nombre), "es"))  
  
  OPTIONAL { ?dataset a dcat:Dataset ; dct:publisher ?publicador . }
  OPTIONAL { ?service a dcat:DataService ; dct:publisher ?publicador . }
  
  FILTER(BOUND(?dataset) || BOUND(?service))
}
GROUP BY ?nombre
ORDER BY DESC(?numTotal)
LIMIT 10

The result will be:

name	numDatasets	numDataServices	numTotal
"Instituto Nacional de Estadística"	21629	1	21630
"Instituto Canario de Estadística"	21326	0	21326
"Comunidad Autónoma del País Vasco"	8707	0	8707
"Parlamento de Canarias"	4355	0	4355
Remaining organization names...	Remaining published datasets...	Remaining published services...	Remaining published totals...