Practical guide to publishing tabular data in CSV files
Fecha del documento: 15-02-2025

Nowadays we have more and more sources of data at our fingertips. According to the European Data Portal, the impact of the open data market could reach up to EUR 334 billion and generate around 2 million jobs by 2025 ('The Economic Impact of Open Data: Opportunities for value creation in Europe. (2020)).
Paradoxically, however, even though data is more accessible than ever before, the possibilities for reusing it are still rather limited. Potential users of such data often face multiple barriers to access and use. There are many facets where quality problems may exist that hinder the re-use of data: poorly descriptive and standardised metadata, choice of licence, choice of format, inappropriate use of formats or deficiencies in the data itself. There are many initiatives that try to measure the quality of datasets based on their metadata: date and frequency of update, licence, formats used, etc., as is the case, for example, in the Metadata Quality Scorecard on the European Data Portal or in the quality dimension of the Open Data Maturity Index.
But these analyses are insufficient since most of the time quality deficiencies can only be identified after the re-use process has started. The work involved in the cleansing and preparation processes thus becomes a major burden that is in many cases unbearable for the open data user. This leads to frustration and loss of interest on the part of the reusing sector in the data offered by public bodies, affecting the credibility of the publishing institutions and considerably lowering the expectations of return and generation of value from the reuse of open data.
These potential problems can be tackled as they have been found to be largely due to the publisher not knowing how to express the data correctly in the chosen format.
For all these reasons, and with the aim of contributing to the improvement of the quality of open data, at datos.gob.es we have decided to create a collection of guides aimed at guiding publishers in the appropriate use of the most commonly used formats and means of access to open data in the field of open data.
The collection of guides starts here with a focus on the CSV format. The choice of this format is based on its popularity in the field of open data, its simplicity and its lightness in expressing data in tabular form. It is the most common format in open data catalogues; specifically, in datos.gob.es it represents 20% of the distributions coexisting with other formats such as XLS or XLSX that could also be expressed as CSV. Moreover, it is a format that we can call hybrid because it combines the ease of automated processing with the possibility of being scanned directly by people with a simple text editor.
This guide covers the basic features of this type of format and a compendium of guidelines for publishing correctly in tabular data, especially in CSV. The guidelines are accompanied by suggestions for free tools that stand out for their ease of working with CSV files and the extra functionality they provide. In addition, a summary of the guidelines in the guide is also available in the form of a Cheet Sheet for ease of use and reference.
What are the main new features of the 2025 update?
The guide has been revised in 2025 to incorporate new sections on common errors and solutions, validation of data types with practical code examples, advanced handling of date fields, and extending the toolbox with tools such as Rainbow CSV and OpenRefine, as well as improved guidelines for optimising data import/export and the handling of large volumes of data.