datos de alto valor | datos.gob.es

Access to reuse of environmental information and open data

Blog

The favorable regime of access to environmental information

Environmental legislation has traditionally been characterized by establishing a more beneficial legal regime than that which has inspired the general rules on access to information held by the public sector. Indeed, the Aarhus Convention, adopted in 1998, was an important milestone in recognizing the right of access to environmental information under very advanced legal conditions, imposing relevant obligations on public authorities. Specifically, the Convention starts from an inescapable premise: in order for society to enjoy the right to a healthy environment and fulfill its duty to respect and protect it, it must have relevant access to environmental information. To this end, on the one hand, the right to obtain information held by public authorities was recognized and, on the other, an obligation was established for the latter to make certain information public without prior request.

In execution of said international treaty and, specifically, of the obligations assumed by the European Union through Directive 2003/4/EC of the European Parliament and of the Council, of January 28, 2003, on public access to environmental information, Law 27/2006, of July 18, regulating the rights of access to information, public participation and access to justice in environmental matters, was approved. Unlike the general regime contemplated in Law 19/2013, of December 9, on transparency, access to public information and good governance, Law 27/2006 does not contain any reference to open and reusable formats. However, it does include the following developments:

establishes the obligation to provide the information even when, without having generated it directly in the exercise of its functions, it is in the possession of the entity from which it is requested;
requires that the grounds for refusal of the request for access be interpreted in a restrictive manner, so that in case of doubt when interpreting the exceptions provided by law, access to information must be favored;
for those cases in which the request is not resolved and notified within the established period, the rule of positive silence is applied and, therefore, access will be understood to be granted.

The impact of regulations on open data and reuse of public sector information

As in the previous regulation, Directive (EU) 1024/2019 excludes its application in those cases in which the corresponding regulation of the Member States limits access. This would not be, therefore, the case of the environment sector, since, apart from the cases in which access is not applicable, in general the availability of the information is especially assured. Consequently, except for the legal exceptions to the obligation to provide environmental information, there are no specific restrictions that would be an obstacle to facilitating its reuse.

On the other hand, one of the main novelties of European legislation is a measure that ultimately obliges the Member States to adapt their regulations regarding access to environmental information. Indeed, Chapter V of the Directive establishes a unique regime for the so-called high-value datasets, which, in general, will be available free of charge, machine-readable, provided through APIs and, where appropriate, provided in the form of bulk download. Precisely, this very favorable legal regime is envisaged, among others, for the field of Earth Observation and Environment, although the specific datasets to which it will apply are still pending a decision by the European Commission after the elaboration of an extensive impact analysis whose final result is yet to be finalized.

On the other hand, following the European regulatory model, among the novelties that Royal Decree-Law 24/2021, of November 2, has incorporated into Spanish legislation on the reuse of public sector information, one that stands out is one referring to high-value data. Specifically, Article 3.ter of Law 37/2007 contemplates the possibility that, in addition to the datasets established by the European Commission, others may be added at the national level by the Ministry of Economic Affairs and Digital Transformation, taking into account the selection made by the Data Office Division, so that those specifically referring to the environment could be extended, where appropriate.

The potential for high-value environmental data

As the European regulation itself points out, the reuse of high-value datasets is seen as a tool to facilitate, among other objectives, the creation and dynamization of value-added digital applications and services that have the potential to generate considerable benefits for society, the environment and the economy. Thus, in this area, open data can play an important role in tackling technological innovation to address challenges of enormous relevance such as climate change, deforestation and, in general, the challenges posed by environmental conservation.

On the other hand, the development of digital applications and services can serve to revitalize rural areas and promote tourism models that value the knowledge and protection of natural resources, especially taking into account the rich and varied natural heritage existing in Spain, for which it is essential to have specific datasets, particularly with regard to natural areas.

Ultimately, from the perspective and demands of Open Government, the accessibility of environmental information, according to the standards of high-value data in accordance with the provisions of the regulations on the reuse of public sector information, could have a significant reinforcement by facilitating social control regarding the decisions of public entities and citizen participation. However, for this it is essential to overcome the model on which the regulatory framework on access to environmental information has traditionally been based, since, although at the time it represented a significant advance, the fact is that the 2006 regulation does not include any reference to the possibilities of technological innovation based on open data.

In short, it seems that the time has come to raise a debate about an eventual update of the sectorial regulation on access to environmental information in order to comply with the requirements of the legal regime contemplated in Directive (EU) 1024/2019.

Content prepared by Julián Valero, Professor at the University of Murcia and Coordinator of the Research Group "Innovation, Law and Technology" (iDerTec).

The contents and points of view reflected in this publication are the sole responsibility of its author.

22/02/2022

Open data in health and education as high-value data?

Blog

When we think of open data our first intuition is usually directed towards data generated by public sector bodies in the exercise of their functions and made available for reuse by citizens and businesses, i.e. public sector open data or open public data. This is natural, because public sector information represents an extraordinary source of data and the intelligent use of this data, including its processing through artificial intelligence applications, has great transformative potential in all sectors of the economy, as recognised by the European directive on open data and re-use of public sector information.

One of the most interesting novelties introduced by the directive was the initial but expandable definition of 6 thematic categories of high-value datasets, whose re-use is associated with considerable benefits for society, the environment and the economy. These six areas - Geospatial, Earth Observation and Environment, Meteorology, Statistics, Societies and Corporate Ownership and Mobility - are the ones that in 2019 were considered to have the greatest potential for the creation of value-added services and applications based on such datasets. However, looking ahead to 2021, which is almost a year into the global health crisis, it seems clear that this list misses two key areas with a high potential impact on society, namely health and education.

Indeed, we find that on the one hand, educational institutions are explicitly exempted from some obligations in the directive, and on the other hand, health sector data are hardly mentioned at all. The directive, therefore, does not provide for a development of these two areas that the circumstances of the covid-19 pandemic have brought to the forefront of society's priorities.

The availability of health and education data

Although health systems, both public and private, generate and store an enormous amount of valuable data in people's medical records, the availability of these data is very limited due to the very high complexity of processing them in a secure way. Health-related datasets are usually only available to the entity that generates them, despite the great value that their release could have for the advancement of scientific research.

The same could be said for data generated by student interaction with educational platforms, which is also generally not available as open data. As in the health sector, these datasets are usually only available to their owners, for whom they are a valuable asset for the improvement of the platforms, which is only a small part of their potential value to society.

The directive states that high-value data should be published in open formats that can be freely used, re-used and shared by anyone for any purpose. Furthermore, in order to ensure maximum impact and facilitate re-use, high-value datasets should be made available for re-use with very few legal restrictions and at no cost.

Health data are highly sensitive to the privacy of individuals, so the delicate trade-off between respect for privacy and the need to support the advancement of scientific research must always be kept in mind. The consideration of health and education data as high-value open data should probably maintain some particular restrictions due to the nature and sensitivity of these data and promote figures such as the donation of data for research purposes by patients or the exchange for the same purpose between researchers. In this sense, the 2018 regulation on data protection introduced the possibility of reusing data for research purposes, provided that the appropriate pseudonymisation measures and the rest of the legally stipulated guarantees are adopted.

The importance of public-private partnerships

Education and health are two areas where the private sector or public-private partnerships are making exciting strides in converting some of the potential of open data into benefits for society. Open data publishing is not the exclusive preserve of the public sector and there is a long tradition of private-public collaboration, largely channelled through universities. Let us look at some examples:

There are a number of initiatives such as the pioneering The UCI Machine Learning Repository founded in 1987 as a repository of datasets used by the artificial intelligence community for empirical analysis of machine learning algorithms. This repository has been cited more than 1000 times, the highest number of citations obtained in the computer science domain. In this and other repositories also managed by universities or foundations with donations from private companies, we can also find open datasets released by companies or in which they have actively collaborated in their creation or development.
Also large technology companies, no doubt inspired by these initiatives, maintain open data search engines or repositories such as Google's dataset search engine, AWS's open data registry, or Microsoft Azure's datasets, where datasets related to health or education are increasingly common.
In terms of data that can contribute to improving education, for example, The Open University publishes OULAD (OpenUniversity Learning Analytics Dataset), an open learning analytics dataset containing data on courses, students and their interactions with the virtual learning environment for seven courses. However, there are very few comparable datasets whose joint use in projects would undoubtedly allow further progress to be made in areas such as detecting the risk of students dropping out.
As far as the health sector is concerned, it is worth highlighting the case of the Spanish platform HealthData 29, developed by Fundación 29, which aims to create the necessary infrastructure to make it possible to securely publish open health datasets so that they are available to the community for research purposes. As part of this infrastructure, Foundation 29 has published the Health Data Playbook, which is a guide for the creation, within the current technical and legal framework, of a public repository of data from health systems, so that they can be used in medical research. Microsoft has collaborated in the preparation of this guide as a technological partner and Garrigues as a legal partner, and it is aimed at organisations that carry out health research.

At the moment the platform only has available the Covid Data Save Lives (COVIDDSL) dataset published by the HM Hospitales University Hospital Group, composed of clinical data on interactions recorded in the covid-19 treatment process. However, it is an excellent example of the potential that we may be missing out on globally by not collecting and publishing more and better data on patients diagnosed with covid-19 in a systematised way and on a global scale. The creation of predictive models of disease progression in patients, the development of epidemiological models on the spread of the virus, or the extraction of knowledge on the behaviour of the virus for vaccine development are just some of the use cases that would benefit from greater availability of this data.

Education and health are two of the great concerns of all developed societies in the world because they are closely related to the well-being of their citizens. But perhaps we have never been more aware of this than in the last year and this represents an extraordinary opportunity to drive initiatives that contribute to unlocking more open health and education data. Whether as high-value data or in any other form, these datasets are key to enabling us to better react to future health crisis situations but also to help us overcome the aftermath of the current one.

Content prepared by Jose Luis Marín, Senior Consultant in Data, Strategy, Innovation & Digitalization.

The contents and points of view reflected in this publication are the sole responsibility of its author.

18/02/2021

Searching high-value data

Blog

The new Directive on the opening of data and the reuse of public sector information, which was adopted last June, will replace and improve the old Directive 2003/98 / EC on the reuse of public sector information. Among the most significant changes within this new Directive is the objective of specifying a list of high-value datasets among those held by public sector bodies.

The creation of a list like this is a very important milestone because, for the first time in 15 years of Directive, we will have an explicit and common guide on what are the minimum datasets that should always be available, as well as the conditions for their reuse throughout the European Union - which will include their reuse for free, through application programming interfaces (APIs), in a machine-readable format and, where appropriate, including the bulk download option.

The questions we all ask ourselves immediately are: what are the high-value data they refer to? And what are the specific criteria that we should apply when identifying such high-value data?

The Directive defines high-value data as “documents whose reuse is associated with important benefits for society, the environment and the economy, in particular because of their suitability for the creation of value-added services, applications and new, high-quality and decent jobs, and of the number of potential beneficiaries of the value-added services and applications based on those datasets”. This definition offers several clues as to how these high-value datasets are expected to be identified through a series of indicators that would include:

Their potential to generate significant social or environmental benefits.
Their potential to generate economic benefits and new income.
Their potential to generate innovative services;
Their potential to benefit a high number of users, in particular SMEs
Their potential to be combined with other datasets.

On the other hand, the Commission opened a consultation process some years ago that has served to evaluate public opinion on the priority of the data to be published. There are also several studies and reference entities in which the Commission has been inspired and which have been publishing its own recommendations related to high strategic value datasets, such as:

The results of the MEPSIR study on the exploitation of the information resources of the European Union.
The technical annex of the G8 Open Data Charter.
The matters that generate business by the infomediary sector in Spain, according to the analysis of the sector carried out by ONTSI.
The criteria established by the ISA program of interoperability solutions of the European Commission.
Standard UNE 178301:2015on Open Data in Smart Cities.
The data analyzed by the Open Data Barometer and the Global Open Data Index..
The datasets to be published proposed by the Federation of Municipalities and Provinces - FEMP.

In addition, the Directive itself offers us once again another additional clue in its annex on what datasets could be finally selected for their high-value, through a series of priority domains that largely coincide with the proposals made by the organisms mentioned above: geospatial data, earth observation and environmental, meteorological, statistical, companies records or transport data.

It should also be remembered that the data related to some of the aforementioned topics are also regulated by specific sectoral legislation - such as Directive 2007/2 / EC on spatial data (INSPIRE), Directive 2003/4/EC on environmental information and Directive 2010/40 / EU on transport data - and therefore such legislation should also be taken into account when defining the final scope of application.

However, as the new Directive clarifies, neither the thematic list is closed nor the specific datasets are still defined. And it is that the European Commission has recently commissioned a new impact study precisely with the objective of defining in detail and substantiating what those datasets called “high-value” should finally be. However, there are also critical voices that cry out for the need for a better definition of the analysis criteria when deciding what these data will eventually be, and also for involving the whole society in the process. Fortunately, both critics and the Commission agree that the solution is to broaden the debate and establish a series of public and expert consultations - as is already reflected in the Directive and in the planned impact study - such as case of the debate that will take place in the next edition of the Aporta Meeting on December 18 in Madrid and whose motto is precisely “Driving high-value data”.

Therefore, we will still have to wait for some time until all the studies and consultations planned are completed in order to finally know in detail what will be the high-value data of mandatory publication in the European Union, although it will surely be with sufficient margin before finalizing the deadline for the Directive transposition in July 2021.

Content prepared by Carlos Iglesias, Open data Researcher and consultan, World Wide Web Foundation.

Contents and points of view expressed in this publication are the exclusive responsibility of its author.

04/12/2019

The new edition of the Aporta Meeting will focus on high-value data

Evento

The 9th edition of the Aporta Meeting is already underway. The appointment will be on December 18 in Madrid, in the morning (from 9:00 a.m. to 2:30 p.m.), and will be focused on high-value data.

High-value public and private data mean an extraordinary source of information to consider due to its great impact on citizens. When we talk about high-value data we refer to those categories delimited by Directive (EU) 2019/1024, of June 20, 2019, related to the geospatial, environmental, meteorological, statistical, commercial and mobility categories. This type of data are a key element to boost innovative services and generate socio-economic and environmental benefits for the entire population.

The relevance and interest of the community for high-value data has led to consider them the main axis of the new edition of the Aporta Meeting. Under the slogan "Driving high-value data", the challenges and opportunities that we will have to face in order to take advantage of all the value of this type of data will be addressed.

The event will be structured in 3 colloquium tables, each one focused on different actors linked to the data ecosystem: high-value data publishers, accelerators that try to boost their reuse, and companies that generate high-value services and products based on reuse.

Table 1: Towards the availability of high-value data. The first table will be formed by representatives of the public administrations generating high-value data. The objective is to analyse which data sets are already available and their potential applications, as well as which ones should be opened to respond to user demand and under what conditions: automated readable formats, downloadable through application programming interfaces (API) and in a massive way, with the granularity and necessary formats, and based on the appropriate licenses.
Table 2: Accelerating the use of high-value data. Table two will be a meeting point for projects aimed at boosting the European data-based entrepreneurship ecosystem. To this end, we have invite representatives of business accelerators and initiatives whose common denominator is to contribute to overcome the barriers faced by SMEs and data start-ups, in order to achieve success in the market.
Table 3. New technological paradigms and the importance of data for their development. The last table will have agents from the reuse sector that will discuss the opportunities offered by the availability of high-value data and the challenges that need to be faced to encourage its use.

The full agenda will be available in the coming weeks. You can follow the event news in social networks, with the hashtag #Aporta2019 and in datos.gob.es.

10/10/2019