Searching high-value data

Publication date 04/12/2019

Description

The new Directive on the opening of data and the reuse of public sector information, which was adopted last June, will replace and improve the old Directive 2003/98 / EC on the reuse of public sector information. Among the most significant changes within this new Directive is the objective of specifying a list of high-value datasets among those held by public sector bodies.

The creation of a list like this is a very important milestone because, for the first time in 15 years of Directive, we will have an explicit and common guide on what are the minimum datasets that should always be available, as well as the conditions for their reuse throughout the European Union - which will include their reuse for free, through application programming interfaces (APIs), in a machine-readable format and, where appropriate, including the bulk download option.

The questions we all ask ourselves immediately are: what are the high-value data they refer to? And what are the specific criteria that we should apply when identifying such high-value data?

The Directive defines high-value data as “documents whose reuse is associated with important benefits for society, the environment and the economy, in particular because of their suitability for the creation of value-added services, applications and new, high-quality and decent jobs, and of the number of potential beneficiaries of the value-added services and applications based on those datasets”. This definition offers several clues as to how these high-value datasets are expected to be identified through a series of indicators that would include:

Their potential to generate significant social or environmental benefits.
Their potential to generate economic benefits and new income.
Their potential to generate innovative services;
Their potential to benefit a high number of users, in particular SMEs
Their potential to be combined with other datasets.

On the other hand, the Commission opened a consultation process some years ago that has served to evaluate public opinion on the priority of the data to be published. There are also several studies and reference entities in which the Commission has been inspired and which have been publishing its own recommendations related to high strategic value datasets, such as:

The results of the MEPSIR study on the exploitation of the information resources of the European Union.
The technical annex of the G8 Open Data Charter.
The matters that generate business by the infomediary sector in Spain, according to the analysis of the sector carried out by ONTSI.
The criteria established by the ISA program of interoperability solutions of the European Commission.
Standard UNE 178301:2015on Open Data in Smart Cities.
The data analyzed by the Open Data Barometer and the Global Open Data Index..
The datasets to be published proposed by the Federation of Municipalities and Provinces - FEMP.

In addition, the Directive itself offers us once again another additional clue in its annex on what datasets could be finally selected for their high-value, through a series of priority domains that largely coincide with the proposals made by the organisms mentioned above: geospatial data, earth observation and environmental, meteorological, statistical, companies records or transport data.

It should also be remembered that the data related to some of the aforementioned topics are also regulated by specific sectoral legislation - such as Directive 2007/2 / EC on spatial data (INSPIRE), Directive 2003/4/EC on environmental information and Directive 2010/40 / EU on transport data - and therefore such legislation should also be taken into account when defining the final scope of application.

However, as the new Directive clarifies, neither the thematic list is closed nor the specific datasets are still defined. And it is that the European Commission has recently commissioned a new impact study precisely with the objective of defining in detail and substantiating what those datasets called “high-value” should finally be. However, there are also critical voices that cry out for the need for a better definition of the analysis criteria when deciding what these data will eventually be, and also for involving the whole society in the process. Fortunately, both critics and the Commission agree that the solution is to broaden the debate and establish a series of public and expert consultations - as is already reflected in the Directive and in the planned impact study - such as case of the debate that will take place in the next edition of the Aporta Meeting on December 18 in Madrid and whose motto is precisely “Driving high-value data”.

Therefore, we will still have to wait for some time until all the studies and consultations planned are completed in order to finally know in detail what will be the high-value data of mandatory publication in the European Union, although it will surely be with sufficient margin before finalizing the deadline for the Directive transposition in July 2021.

Content prepared by Carlos Iglesias, Open data Researcher and consultan, World Wide Web Foundation.

Contents and points of view expressed in this publication are the exclusive responsibility of its author.