Practical guide for improving the quality of open data

Fecha del documento: 29-09-2022

Practical guide for improving the quality of open data

When publishing open data, it is essential to ensure its quality. If data is well documented and of the required quality, it will be easier to reuse, as there will be less additional work for cleaning and processing. In addition, poor data quality can be costly for publishers, who may spend more money on fixing errors than on avoiding potential problems in advance.

To help in this task, the Aporta Initiative has developed the "Practical guide for improving the quality of open data", which provides a compendium of guidelines for acting on each of the characteristics that define quality, driving its improvement. The document takes as a reference the data.europe.eu data quality guide, published in 2021 by the Publications Office of the European Union.

Who is the guide aimed at?

The guide is aimed at open data publishers, providing them with clear guidelines on how to improve the quality of their data.

However, this collection can also provide guidance to data re-users on how to address the quality weaknesses that may be present in the datasets they work with.

What does the guide include?

The document begins by defining the characteristics, according to ISO/IEC 25012, that data must meet in order to be considered quality data, which are shown in the following image

Data quality attributes: accuracy, completeness, consistency, credibility, timeliness, accessibility, compliance, confidentiality, efficiency, precision, traceability, comprehensibility.

Next, the bulk of the guide focuses on the description of recommendations and good practices to avoid the most common problems that usually arise when publishing open data, structured as follows:

  • A first part where a series of general guidelines are detailed to guarantee the quality of open data, such as, for example, using a standardised character encoding, avoiding duplicity of records or incorporating variables with geographic information. For each guideline, a detailed description of the problem, the quality characteristics affected and recommendations for their resolution are provided, together with practical examples to facilitate understanding.
  • A second part with specific guidelines for ensuring the quality of open data according to the data format used. Specific guidelines are included for CSV, XML, JSON, RDF and APIs.
  • Finally, the guide also includes recommendations for data standardisation and enrichment, as well as for data documentation, and a list of useful tools for working on data quality.

You can download the guide here or at the bottom of the page (only available in Spanish).

Additional materials

The guide is accompanied by a series of infographics that compile the above guidelines:

screenshot of the inforgraphic "General guidelines for quality assurance of open data".

Accessible version

screenshot of the inforgraphic "Guidelines for quality assurance using specific data formats”.

Accessible version

Documentation

    • Practical guide for improving the quality of open data (only available in Spanish)
      pdf
      1.26 MB
    • Reusable version (only available in Spanish)
      docx
      6.55 MB
    • Infographic "General guidelines for quality assurance of open data".
      jpg
      1.09 MB
    • Infographic "Guidelines for quality assurance using specific data formats" (reusable version)
      docx
      7.17 MB
    • Infographic "Guidelines for quality assurance using specific data formats"
      jpg
      1.49 MB
    • Infographic "General guidelines for quality assurance of open data" (reusable version)
      docx
      6.99 MB