The five most common technical problems when open data are published

Fecha de la noticia: 21-04-2017

5 problemas técnicos open data

The results of the third edition of the Open Data Barometer shows how governments, as part of their usual activity, are already collecting and managing a large amount of data in areas as diverse as cartography, land registry, statistics, budgets and public spending, business records, legislation, transport, trade, health, education, crime, environment, election results or public procurement.

In addition, more than three-quarters of these official data are also available online on multiple websites managed by different government agencies. Nevertheless, only a small percentage of all those digitized datasets (around 10%) can actually be considered open data, and the vast majority of them are concentrated in the top 10 countries.

So why are these differences so significant? What are the problems that make the vast majority of the data published by governments not be considered open data? Let’s review the five most common problems, ordered according to their frequency, detected in the study made by the Barometer after analyzing 1,380 datasets:

  1. Unopened Licenses: Only 18% of the published data is clearly associated with an open license that allows the re-use of the data without any restriction beyond the attribution to the original source. This means that in the majority of available data the licenses are restrictive or simply unknown, preventing their re-use.
  2. Incomplete datasets: Only 32% of the scanned datasets are published completely to promote their download and re-use. Most of the data are currently divided and disseminated among multiple sections of the site where they are published or even among different websites making their location considerably difficult.
  3. Non-machine-readable or hard-to-reuse formats: Barely half of the data (55% in particular) are published in formats that, in addition to being readable by machines, can also be easily reused. The use of standardized formats is very limited, which makes interoperability more difficult. Moreover, proprietary formats are the majority, limiting access to those users who do not have the necessary software.
  4. Outdated data: Up to 26% of available data are not updated as often as it should be appropriate according to their nature. In addition, in most cases it is also not indicated what will be the update frequency to follow. Outdated data loses much of its interest to potential reusers.
  5. Rate of access: It is still necessary to pay a fee in order to have full access to 10% of the published data. This does not only limits the public benefits of using such data, but it also contributes to the digital divide.

All of these problems directly affect the features that are distinctive of open data which are the key to their potential. Until all these characteristics are met, it will not be possible to obtain all the social and economic benefits offered by the open data.