Practical guide for the publication of linked data

Fecha del documento: 27-01-2022

Guidance for publishing linked data in RDF

It is important to publish open data following a series of guidelines that facilitate its reuse, including the use of common schemas, such as standard formats, ontologies and vocabularies. In this way, datasets published by different organizations will be more homogeneous and users will be able to extract value more easily.

One of the most recommended families of formats for publishing open data is RDF (Resource Description Framework). It is a standard web data interchange model recommended by the World Wide Web Consortium, and highlighted in the F.A.I.R. principles or the five-star schema for open data publishing.

RDFs are the foundation of the semantic web, as they allow representing relationships between entities, properties and values, forming graphs. In this way, data and metadata are automatically interconnected, generating a network of linked data that facilitates their exploitation by reusers. This also requires the use of agreed data schemas (vocabularies or ontologies), with common definitions to avoid misunderstandings or ambiguities.

In order to promote the use of this model, from datos.gob.es we provide users with the "Practical guide for the publication of linked data", prepared in collaboration with the Ontology Engineering Group team - Artificial Intelligence Department, ETSI Informáticos, Polytechnic University of Madrid-.

The guide highlights a series of best practices, tips and workflows for the creation of RDF datasets from tabular data, in an efficient and sustainable way over time.

Who is the guide aimed at?

The guide is aimed at those responsible for open data portals and those preparing data for publication on such portals. No prior knowledge of RDF, vocabularies or ontologies is required, although a technical background in XML, YAML, SQL and a scripting language such as Python is recommended.

What does the guide include?

After a short introduction, some necessary theoretical concepts (triples, URIs, controlled vocabularies by domain, etc.) are addressed, while explaining how information is organized in an RDF or how naming strategies work.

Next, the steps to be followed to transform a CSV data file, which is the most common in open data portals, into a normalized RDF dataset based on the use of controlled vocabularies and enriched with external data that enhance the context information of the starting data are described in detail. These steps are as follows:

Steps to follow to transform CSV data to RDF. Step 1: Selection of controlled vocabulary for the domain. Step 2: Cleaning and preparation of CSV data. Step 3: Construction of transformation rules (mappings). Step 4: Generation of RDF data from the rules. Source: Practical guide for the publication of linked data. datos.gob.es.

The guide ends with a section oriented to more technical profiles that implements an example of the use of RDF data generated using some of the most common programming libraries and databases for storing triples to exploit RDF data.

Additional materials

The practical guide for publishing linked data is complemented by a cheatsheet that summarizes the most important information in the guide and a series of videos that help to understand the set of steps carried out for the transformation of CSV files into RDF. The videos are grouped in two series that relate to the steps explained in the practical guide:

1) Series of explanatory videos for the preparation of CSV data using OpenRefine. This series explains the steps to be taken to prepare a CSV file for its subsequent transformation into RDF:

  • Video 1: Pre-loading tabular data and creating an OpenRefine project.
  • Video 2: Modifying column values with transformation functions.
  • Video 3: Generating values for controlled lists or SKOS.
  • Video 4: Linking values with external sources (Wikidata) and downloading the file with the new modifications.

2) Series of explanatory videos for the construction of transformation rules or CSV to RDF mappings.  This series explains the steps to be taken to transform a CSV file into RDF by applying transformation rules.

  • Video 1: Downloading the basic template for the creation of transformation rules and creating the skeleton of the transformation rules document.
  • Video 2: Specifying the references for each property and how to add the Wikidata reconciled values obtained through OpenRefine.

Below you can download the complete guide, as well as the cheatsheet. To watch the videos you must visit our Youtube channel.

Documentation

    • Cheatsheet: Practical guide for publishing linked data in RDF (only available in Spanish)
      pdf
      218.61 KB
    • Cheatsheet: Practical guide for publishing linked data in RDF (only available in Spanish)
      pptx
      116.31 KB
    • Practical guide for publishing linked data in RDF (only available in Spanish)
      pdf
      3.5 MB
    • Practical guide for publishing linked data in RDF (only available in Spanish)
      docx
      9.84 MB