The agreement to provide statistical data to researchers, in the context of the Data Governance Regulation

Fecha de la noticia: 29-05-2024

researcher in front of screen with data

The European Union has devised a fundamental strategy to ensure accessible and reusable data for research, innovation and entrepreneurship. Strategic decisions have been made both in a regulatory and in a material sense to build spaces for data sharing and to foster the emergence of intermediaries with the capacity to process information.

European policies give rise to a very diverse ecosystem that should be differentiated. On the one hand, there is a deepening of open data reuse policies. On the other hand, the aim is to cover a space that has been inaccessible until now. We are referring to data that, due to the guarantee of the fundamental right to data protection, intellectual property or business secrecy, was inaccessible. Today, anonymization technologies, as well as data intermediation technologies, make it possible to process them with due guarantees. Finally, the aim is to provide resources through the promotion of data spaces, initiatives that propose federative models, such as Gaia X, or the European digital infrastructures (EDIC) promoted by the European Commission and the Digital Innovation Hubs aimed at promoting business and government in this field.  This scenario will boost different types of use in research, invocation and entrepreneurship.

This article focuses on the agreement signed by the National Statistics Institute (INE), the State Tax Administration Agency (AEAT), different Social Security bodies, the State Public Employment Service (SEPE) and the Bank of Spain to boost access to data, which is part of this EU strategy whose principles, rules and conditions must be explained in order to place it in context, underline its importance and understand the implications of the agreement.    

Competing by guaranteeing our rights

The EU competes at a structural disadvantage vis-à-vis the US or the People's Republic of China. On the North American side, the development processes of disruptive technologies in the context of the Internet and, particularly, the deployment of search engines, social networks and mobile applications have favoured the birth of a data broking market in which a few companies have an almost monopolistic power over data. The great champions of the digital world manage information on practically all sectors of activity, thanks to a business model based on the capitalisation or commoditisation of our privacy and their entry into sectors such as health or activity bracelets. Every time a user did a search, sent an email, commented on a social network or dictated a message to a mobile phone, it fuelled that position of dominance and underpinned the development of large language models in artificial intelligence or the deployment of algorithmic tools linked to neuroemotional marketing.

On the Chinese side, there is a closed internet model under state control, with a position of participation and surveillance over the large local multinationals in the sector and a global dominance over 5G network traffic. It is a vigilant state that has become the first power in the deployment of artificial intelligence through video surveillance and facial recognition and has a very clear state policy on the deployment of artificial intelligence (AI), creating advantages to compete in this race.

The EU starts from an apparently disadvantageous position. It is not at all a question of lack of talent or high abilities. Much of the Internet and IT ecosystem has been developed in Europe or by European talent. However, our market has not been able to generate conditions that would allow the emergence of major technological champions capable of supporting the entire value chain, from cloud infrastructures to the availability of large volumes of data that feed this ecosystem. Moreover, the EU adopted an ethical, political and legal commitment to freedoms, equity and democracy. This position, which has operated as a kind of barrier in terms of costs and processes, integrates within it the essential requirements for a democratic, inclusive and liberty-guaranteeing digital transformation.

The Data Governance Act

The legal substratum of data sharing is integrated by a complex modular structure integrating the General Data Protection Regulation (GDPR), the Open Data and Re-use of Public Sector Information Directive, the Data Governance Act (DGA), the Data Act (DA) and, in the immediate future, the artificial Intelligence Act and the European Health Data Space Regulation (EHDS). The rules should facilitate the re-use of data, including those under the scope of data protection, intellectual property and business secrecy. Several factors must operate to make this possible, which are set out below:

  1. Data sharing from government should grow exponentially and generate a data market that is currently monopolised by foreign companies.
  2. Digital sovereignty in legal terms will also be a growth driver insofar as it defines market rules based on the philosophy of the European Union centred on the guarantee of fundamental rights. This should have an immediate consequence when defining processes aimed at producing safe and reliable products.
  3. Digital sovereignty will in turn have important technological consequences. Public data spaces, whether promoted from digital hubs or federations of nodes, such as Gaia X, should make data available to the individual researcher or start-up, including application dashboards and technical support.
  4. The result of the regulation is to accelerate and increase the possibilities for freeing and sharing data. The EU and the convention under discussion seek to release data subject to trade secrecy, intellectual property or, in particular, the protection of personal data, in a secure manner through intermediation processes in secure data environments. This matter has occupied, among others, the Spanish Data Protection Agency and the European Cybersecurity Agency (ENISA). This implies a commitment to anonymisation and/or quasi-anonymisation environments through technologies such as differential privacy, homomorphic encryption and homomorphic encryption or multi-party computing.

All of this is based on the guarantee of fundamental rights and the empowerment of people. GDPR, DGA, DA and EHDS should make it possible to achieve the dual objective of creating a European market for the free movement and re-use of protected data. This ensures that individuals and organisations can exercise their rights of control and, at the same time, share these rights, while also encouraging data altruism. Moreover, the GDPR, DGA, EHDS and the artificial Intelligence Act define precise limits through prohibitions on use, regulated access conditions and ethically and legally sound design procedures. With an idea that should be considered central, there is a dimension of public or common interest that, beyond the epic battles of COVID, reaches the small but essential aspirations of the individual researcher, the disruptive entrepreneur, the SME trying to improve its value chain or the Administration innovating processes at the service of people.

Spain commits to the digital transformation of data spaces

The 2025 Plan, the Artificial Intelligence Strategy, the efforts of the Next Generation funds through its Strategic Projects for Economic Recovery and Transformation (PERTES in Spanish), the AI Missions and the Digital Bill of Rights exemplify Spain's alignment and leadership in this field. To make these strategies viable, secure data and process environments are essential. Now, the National Health Data Space has been joined by the agreement between the INE, the AEAT, different Social Security bodies, the SEPE and the Bank of Spain. As its explanatory memorandum states, it constitutes a first and encouraging step towards the deployment of DGA in our country.

They understand not only the scientific and business value of the statistical information they handle, but also the significant growth in demand and need for it. On the other hand, they take on a qualitatively relevant issue: the interest derived from the interconnection of datasets from the point of view of the value they bring. They therefore declare their willingness to maximise the added value of their data by allowing cross-referencing or integration when research is carried out for scientific purposes in the public interest.

The keys to the agreement to provide statistical data to researchers for scientific purposes in the public interest

Some of the questions that may arise with regard to this agreement are answered below.

  • How can the data be accessed?

Access to data goes through a cross-information access request that must be individually accepted by each institution. This takes into account certain assessment criteria regarding the nature of the data and the interest of the proposal.

Facilitating this access implies for the signatory institutions an effort of de-identification and cross-checking carried out by each of them directly or through trusted third parties. The result, "depending on the security level of the resulting file", will entail:

  • direct and autonomous access.
  • processing of the data in one of the secure rooms or centres made available by the signatory entities.

Some of the rooms currently available are:

Also noteworthy is the creation of ES_DataLab, which facilitates access to microdata in an environment that guarantees the confidentiality of the information. It allows cross-referencing data from different participating institutions, such as the INE, the AEAT, the Secretary of State for Social Security and Pensions, the Social Security General Treasury (TGSS), the National Social Security Institute (INSS), the Social Marine Institute (ISM), the Social Security IT Management (GISS), the State Public Employment Service and the Bank of Spain.

In implementation of the DGA 's plans, the Single National Information Point(NSIP), managed by the General Directorate of Data, has been set up, from where citizens, business people or researchers can locate information on protected public sector data. This item is available at datos.gob.es.

  • What data is shared?

The volume and typologies of data they handle are truly significant. The press release presenting the agreement stated that it would be possible to access "the microdata bases owned by the INE, the AEAT, the SS and the BE, with the necessary guarantees of security, statistical secrecy, personal data protection and compliance with current legislation. In addition to statistical databases from its surveys, INE may also provide access to administrative registers, both those compiled or coordinated by INE and those under other ownership but which INE uses to compile its statistics (in the latter case consulting all requests for access to the holders of the corresponding registers)".

  • Who can access the data?

In order to grant access to the data, the confidentiality regime applicable to the data requested and its legal framework, the social interest of the results to be obtained in the research, the profile, trajectory and scientific publications of the principal investigator and associated researchers or the history of research projects of the entity backing the project, among other aspects, shall be taken into account.

One of the issues envisaged by the DGA in this area consists of establishing economic considerations that ensure the sustainability of the system. In any case, the third clause of the agreement provides for the possibility of receiving financial consideration from applicants for the services of preparing and making available the data contained in the databases owned by them, in accordance with the provisions of statistical legislation (Article 21.3 of the Law 12/1989 of 9 May 1989 on the Public Statistical Function - LFEP) and in the regulations governing each institution.

  • What challenges do data access requesters and signatories face?

Regardless of the scientific conditions of the research proposal, it is essential to appeal to the deploying institutions to significantly increase the quality of their data protection and information security compliance processes. But this will not be enough, the deployment of artificial intelligence requires the incorporation of additional processes that we can find in the document of the Conference of Rectors of Spanish Universities CRUE ICT 360º, addressed in 2023 for the assumption of the university. While it is true that the artificial Intelligence Act proposes a scenario of less regulation in basic research, it also requires a high level of ethical deployment. And to do so, it will be essential to apply principles of artificial intelligence ethics, with the model ALTAI (Assessment List for Trustworthy Artificial Intelligence) or an alternative model, and to the Fundamental Rights Impact Analysis (FRAIA). This is without neglecting the high legal requirements for the development of market-oriented systems. Beyond the formal declarations of the Convention, the lessons learned from European projects affirm the need for a procedural framework of evidence-based legal and ethical verification of research projects and the capacities of institutions requesting access to data.

From the point of view of the signatory institutions, in addition to the challenge of the economic sustainability of the model, foreseen and regulated in the agreement, the need for a regulatory investment strategy seems evident. We have no doubt that each data repository and the processes underpinning them have been subject to a data protection impact assessment and security methodologies linked to the National Scheme. Data protection by design and by default or compliance with the recommendations on anonymisation and data space management mentioned above will be further elements considered. This translates into processes, but also into people - chief data officers, data analysts, other mediators such as data protection officers, etc. - together with a high level of security requirements. On the other hand, the duty of transparency vis-à-vis citizens will require efficient channels and a very precise risk management model in the event of a possible mass exercise of a right to object to processing, without prejudice to its feasibility.

Finally, the Spanish Data Protection Agency should approach this process in a proactive and promotional way without renouncing its role as guarantor of fundamental rights, but contributing to the development of functional solutions. This is not just any agreement but an essential test bed for the future of data research in Spain.

In our opinion, the most exciting statement of these institutions consists of understanding the agreement "as the embryo of the future System of access to data for research for scientific purposes of public interest, which must be in accordance with the Spanish and European strategy on data and the legislation on its governance, within a framework of development of public sector data spaces, and respecting in any case the autonomy and the legal regime applicable to the Banco de España".


Content prepared by Ricard Martínez, Director of the Chair of Privacy and Digital Transformation. Professor, Department of Constitutional Law, Universitat de València. The contents and points of view reflected in this publication are the sole responsibility of its author.