Nowadays we have more and more sources of data at our fingertips. According to the European Data Portal, the impact of the open data market could reach up to EUR 334 billion and generate around 2 million jobs by 2025 ('The Economic Impact of Open Data: Opportunities for value creation in Europe. (2020)).
Paradoxically, however, even though data is more accessible than ever before, the possibilities for reusing it are still rather limited. Potential users of such data often face multiple barriers to access and use. There are many facets where quality problems may exist that hinder the re-use of data: poorly descriptive and standardised metadata, choice of licence, choice of format, inappropriate use of formats or deficiencies in the data itself. There are many initiatives that try to measure the quality of datasets based on their metadata: date and frequency of update, licence, formats used, etc., as is the case, for example, in the Metadata Quality Scorecard on the European Data Portal or in the quality dimension of the Open Data Maturity Index.
But these analyses are insufficient since most of the time quality deficiencies can only be identified after the re-use process has started. The work involved in the cleansing and preparation processes thus becomes a major burden that is in many cases unbearable for the open data user. This leads to frustration and loss of interest on the part of the reusing sector in the data offered by public bodies, affecting the credibility of the publishing institutions and considerably lowering the expectations of return and generation of value from the reuse of open data.
These potential problems can be tackled as they have been found to be largely due to the publisher not knowing how to express the data correctly in the chosen format.
For all these reasons, and with the aim of contributing to the improvement of the quality of open data, at datos.gob.es we have decided to create a collection of guides aimed at guiding publishers in the appropriate use of the most commonly used formats and means of access to open data in the field of open data.
The collection of guides starts here with a focus on the CSV format. The choice of this format is based on its popularity in the field of open data, its simplicity and its lightness in expressing data in tabular form. It is the most common format in open data catalogues; specifically, in datos.gob.es it represents 20% of the distributions coexisting with other formats such as XLS or XLSX that could also be expressed as CSV. Moreover, it is a format that we can call hybrid because it combines the ease of automated processing with the possibility of being scanned directly by people with a simple text editor.
This guide covers the basic features of this type of format and a compendium of guidelines for publishing correctly in tabular data, especially in CSV. The guidelines are accompanied by suggestions for free tools that stand out for their ease of working with CSV files and the extra functionality they provide. In addition, a summary of the guidelines in the guide is also available in the form of a Cheet Sheet for ease of use and reference.
What are the main new features of the 2025 update?
The guide has been revised in 2025 to incorporate new sections on common errors and solutions, validation of data types with practical code examples, advanced handling of date fields, and extending the toolbox with tools such as Rainbow CSV and OpenRefine, as well as improved guidelines for optimising data import/export and the handling of large volumes of data.
The DCAT-AP application profile aims to describe, using metadata, the catalogs and datasets of European open data portals. For this, DCAT-AP is based on Data Catalogs Vocabulary (DCAT), published by W3C. In particular, the DCAT-AP is a specification that describes a series of restrictions (such as properties range) on the DCAT model.
In a context of continue economic, technological and social changes, this application profile is constantly evolving and improving to meet users demands. The organism in charge of managing the maintenance and evolution of DCAT-AP is JoinUp, a collaborative platform created by the European Commission and financed by the European Union through the ISA and ISA2 Programs. Through this tool, different versions of DCAT-AP and guidelines for their standard implementations have been published.
To help those organizations that have doubts about how to apply this profile, here there are the main DCAT-AP documents and resources available in JoinUp:
| Documents | Description |
|---|---|
| DCAT-AP versions | The different versions on the DCAT-AP profiles are shown through a timeline. In this way, you can easily access to latest one. |
| Implementation guidelines | It includes a list of technical and organizational guidelines to facilitate the implementation of DCAT-AP, which includes examples of implementations that can help solve different challenges. In addition, users could share the tools they have developed (such as validators). |
| National extensions analysis of DCAT-AP | Based on the DCAT-AP specification, each UE country has produced a series of adaptations to meet its own needs. This analysis covers these extensions, looking for repetitive patterns that could be used as an input for future versions of DCAT-AP. |
| GeoDCAT-AP | An extension of DCAT-AP for the exchange of descriptions of geospatial datasets and services. |
| StatDCAT-AP | An extension of DCAT-AP for the exchange of descriptions of statistical datasets and services. |
| Change and Release Management Policy for DCAT-AP | Documentation related to the changes that can be distinguished for DCAT-AP. It analyses 3 types of changes according to their implications for interoperability: bug fixed, minor semantic changes and major semantic changes. |
| Tools library | Includes tools developed by SEMIC or the users to promote semantic interoperability. |
| Document library | On this page, they keep track of studies carried out under different actions of the ISA² Programme, grouped according different topics. |
In addition to these resources, there are force task that have developed reports applied to specific fields, such as research field. For their part, W3C itself has also published reports and tools to help users, such as this guide called dataset Exchange Use Cases and requirements, in this case focused on DCAT.
To be informed about all changes that take place and the documents that are published, users can subscribe to the GitHub project created to share experiences, challenges and suggestions of new features.
Datos.gob.es is also part of Joinup''s collaborators network, so we actively participate in the dissemination of the contents and resources created to facilitate the implementation of DCAT-AP. If you want to know more about DCAT-AP application profile, we recommend the report: DCAT-AP and its extensions: Context and evolution.
One of the main challenges that arise when addressing an Open Data initiative is to define the information architecture and facilitate interoperability between data catalogs published by different portals on the Web. In order to solve this challenge, the World Wide Web Consortium (W3C) published the Data Catalog Vocabulary (DCAT), an RDF vocabulary to describe data catalogs based on 3 key concepts: catalog, dataset and distribution.

Based on this vocabulary, and within the JoinUP project, a collaborative platform created by the European Commission, an international group of experts developed the DCAT Application profile for data portals in Europe (DCAT-AP): a specification that describes restrictions (such as properties range) on the DCAT model. The objective is to facilitate homogenization and cross-searching, using metadata, between different European data portals generated by public sector and placed at citizens disposal for reuse.
The report DCAT-AP and its extensions: context and evolution, developed within the Aporta Initiative framework, arose to contextualize and delve into DCAT-AP, and DCAT vocabulary. The report includes a description of both publications, as well as a definition of the agencies and institutions involved in its definition.
DCAT-AP extensions and modifications
Based on DCAT-AP, sector extensions have been developed, some of the most relevant in specific areas of application are described in this report: DCAT-AP HVD, DCAT-AP extension for the description of high-value data, GeoDCAT-AP, focused on the exchange of descriptions of geospatial datasets and services, StatDCAT-AP, an extension of DCAT-AP for the exchange of descriptions of statistical datasets and services, MLDCAT-AP extending DCAT-AP in the field of machine learning, and BRegDCAT-AP for the description of fundamental aspects of public administrative records.
Since its appearance and throughout these years, practically all the Member States of the European Union have extended the DCAT-AP application profile to meet their needs. Special emphasis is placed on Spain, where there is the peculiarity that the "extension" - the Norma Técnica de Interoperabilidad de Reutilización de recursos de información (NTI-RISP) - preceded the DCAT-AP specification itself, which at the time of writing has evolved into the DCAT-AP-ES reference extension.. (NTI-RISP) establishes the common framework for opening and using documents and information resources produced or held by public administrations. This technical standard aims to ensure the persistence of information and the use of formats, and to promote appropriate terms and conditions of use. The NTI-RISP standard predates the first versions of DCAT and DCAT-AP, which has resulted in some differences.
Finally, some DCAT-AP extensions implemented by the different Member States are listed for reference.
As a result of transparency and citizen participation demand, an increasing number of towns are focus on initiatives that facilitate citizen access to institutions and administrations´ information. However, defining, implementing and documenting an open data policy could be a challenging issue. Some of the most frequently asked questions by agents involved in these initiatives are:
-
Which data are more strategic and which fundamental aspect should be consider at publishing?
-
How could I facilitate datasets integration from different sources?
-
What is the regulatory framework?
In this context, AENOR elaborated the Technical Standard UNE 178301: 2015. It provides a series of recommendations to standardize open data publication and improve data management. This Technical Standard includes a list of 11 datasets considered a priority by AENOR: shops catalog, cultural agenda, population (municipal census), air quality, contracts, initial budget and budget in execution, public parking, regular bus, traffic situation, tourist interest points and street guide.

In addition, Technical Standard UNE 178301: 2015 includes recommended vocabulary to optimize data publication, framed within Linked Data paradigm (a set of best practices, articulated through W3C standard technologies). The objective is to facilitate the development of a data website where different elements can be linked, simplifying navigation and data location, within a common international framework.
The report "Open data representation vocabulary in Digital Cities" provides an analysis of this Technical Standard. It include the description of each dataset, potential use cases and legislative framework- when applicable–, and common publication formats. In addition, the report includes an assessment of AENOR´s semantic proposal adequacy and development degree.
The Iniciativa Aporta publishes the Guide to publish open data quickly and easily (with CKAN); a handbook that shows how to articulate an open data project without the need of extensive knowledge or prior experience in the opening of public sector information. In addition, this material includes a set of graphs, explanatory and visual tables designed to facilitate the understanding of the guidelines of the document.
The guide is structured so in the first part of the document, guidelines and recommendations are provided to locate and prepare the data for its opening, to subsequently show in detail how they are published on the web using the open source tool CKAN.
En esta unidad se describen las principales directrices de la especificación DCAT-AP que permite la descripción de catálogos de conjuntos de datos del sector público en Europa y la Norma Técnica de Interoperabilidad de Reutilización de aplicación nacional. Su conocimiento y aplicación es fundamental para garantizar la interoperabilidad entre los diferentes sistemas existentes.
Objetivos:
- Comprender el concepto de interoperabilidad como la capacidad de dos organizaciones de poder intercambiar información entre ellas. En la UE se ha definido para las administraciones públicas el marco europeo de interoperabilidad en cuatro niveles: legal, organizativo, semántico y técnico.
- Aproximarse a la interoperabilidad semántica como la posibilidad de interacción a nivel de datos entre distintos sistemas, garantizando que el intercambio de información respeta la coherencia y el significado de los contenidos intercambiados.
- Conocer que DCAT es una especificación (vocabulario) para la descripción de catálogos de datos en la red, que ha sido elaborada por el W3C.
- Entender que el perfil de aplicación DCAT o DCAT-AP para portales de datos es una especificación basada en DCAT para describir catálogos de conjuntos de datos del sector público en Europa.
- Profundizar en el conocimiento de la Norma Técnica de Interoperabilidad de Reutilización de Recursos de Información (NTI-RISP) como norma española para seleccionar, identificar, describir y poner a disposición conjuntos de datos reutilizables.
Unidad didáctica:
Este obra está bajo una licencia de Creative Commons Reconocimiento-CompartirIgual 4.0 Internacional.
The Technical Standard for Interoperability for the Reuse of information resources establishes common conditions on selection, identification, description, format, conditions of use and making available of documents and information resources prepared or kept by the public sector, relating to numerous areas of interest such as social, economic, legal, tourist information, about companies, education, etc., complying with the provisions of Law 37/2007, of November 16.
These conditions are intended to facilitate and guarantee the process of reuse of public information from public administrations, ensuring the persistence of the information, the use of formats as well as the appropriate terms and conditions of use.