There are a number of data that are very valuable, but which by their nature cannot be opened to the public at large. These are confidential data which are subject to third party rights that prevent them from being made available through open platforms, but which may be essential for research that promotes advances for society as a whole, in fields such as medical diagnosis public policy evaluation, detection or prosecution of criminal offences, etc.
In order to facilitate the extraction of value from these data, in compliance with the regulations in force and the rights attached to them, researchers have been provided with researchers secure processing environments, known as safe rooms, have been made available to researchers. The aim is to enable researchers to request and subsequently use and integrate the data contained in certain databases held by organisations to carry out scientific work in the public interest
All in a controlled, secure and privacy-preserving manner . Therefore, researchers and institutions having access to the data are obliged to maintain absolute confidentiality and not to disseminate any identifiable information.
In this context, the National Statistics Institute (INE), the State Tax Administration Agency (AEAT), various Social Security bodies, the State Public Employment Service (SEPE) and the Bank of Spain have signed an agreement with the National Statistics Institute (INE) have signed an agreement to boost controlled access to this type of data. The agreement is in line with the European Union's strategy and the Data Governance Regulation, as we explained in this article. One of the advantages of this agreement is that it facilitates the cross-referencing of data from different organisations through Es_Datalab.
Es_Datalab, joint access to multiple databases
ES_DataLab is a restricted microdata laboratory for researchers developing projects for scientific and public interest purposes. Access to the microdata takes place in an environment that guarantees the confidentiality of the information, as it does not allow the direct identification of the units, coming from different databases.
To access this environment, you must make an application as described here and access will only be valid for the specified period of the research. The process is as follows:
- The researcher must be recognised as a "research entity". There is currently a register of entities (universities, research institutes, research departments of public administrations, etc.) which will be expanded as new organisations apply to join.
- Once accredited, the entity must apply for access to microdata, which requires the submission of a research proposal .
Through Es_ Datalab, it is possible to access to several microdata, collected at this link. In this sense, ES_Datalab facilitates the cross-referencing of databases of the participating institutions, maximising the value that the data can offer to the development of research.
Below are some examples of the data provided by each of the agencies, either through ES_datalab for cross-checking with other sources, or in their own secure processing environments.
The National Institute of Statistics
It currently makes available microdata relating to INE datasets, including:
- Results of surveys that collect information on the labour market insertion of university graduates, the wage structure, the active population, living conditions, health in Spain, etc.
- Statistics on various social and economic aspects, such as marriages or deaths, environmental protection activities, subsidiaries of companies abroad, etc.
- Censuses, both general population and by economic activity (e.g. agricultural census).
The INE, in turn, has its own secure own secure room which facilitates access to confidential data for statistical analysis for scientific purposes in the public interest.
State Tax Administration Agency
The microdata relating to the databases provided by the AEAT include detailed information on:
- Data on the main items contained in various forms, such as form 100, relating to the annual personal income tax return, form 576, on vehicle registrations, or form 714, on wealth tax, among others.
- Foreign trade statistics, with both total data and data segmented by sector of activity.
Also noteworthy is the contribution of the Institute for Fiscal Studies, which draws on data from the State Tax Administration Agency (Agencia Estatal de la Administración Tributaria). Linked to the Ministry of Finance, it has made available to the public a Statistics Area of the Institute for Fiscal Studies (Instituto de Estudios Fiscales) as well as its own secure room. Its databases include, for example, personal income tax samples, household panels, income panels and the Spanish sector economic database (BADESPE). The product description and data request protocol is available here here.
Social Security
The Social Security grants access to microdata referring to databases such as:
- The Continuous Sample of Working Lives (MCVL), which includes individual, current and historical data on contribution bases, affiliations (working life), pensions, cohabitants, Personal Income Tax (IRPF), etc.
- Social Security affiliates with monthly information on labour relations, by dates of registration and deregistration of companies, type of contract, collective, regime, province, etc.
- Benefits recognised in the previous year, including retirement, permanent disability, temporary disability and childbirth and childcare pensions.
- Other databases such as various budget settlements, temporary redundancy procedures (ERTE) by COVID-19, medical examinations by theSocial Marine Institute (ISM), and the medical examinations of the Social Marine Institute (ISM) or data on maritime or data on student maritime training .
The social Security secure rooms available in Madrid, Barcelona and Albacete, allow the processing of this and other protected information by offering access to a series of secure workstations with various programmes and programming languages (SAS, STATA, R, Python and LibreOffice). Remote access is also allowed through secure devices (called "bastioned devices") that are distributed to researchers.
Thanks to these data, it has been possible to carry out studies on the impact of the retirement age on mortality o the use of paternity leave in Spain.
Bank of Spain
We can also find in Es_Datalab microdata related to the Bank of Spain and databases such as:
- Company databases, containing information on individual companies, consolidated non-financial corporate groups or the structure of corporate groups.
- Macroeconomic data, such as public sector debt or loans to legal entities.
- Other data relating to sustainability indicators or the household panel.
BELab is the secure data laboratory managed by the Banco de España, offering on-site (Madrid) and remote access. Its data have enabled the development of projects on the effects of the minimum wage on Spanish companies, technology management in the textile sector and machine learning applied to credit risk, among others. You can find out about all projects here both those that have been completed and those that are still in progress.
Boosting the re-use of data through the Data Governance Regulation
All these measures are part of the harmonised approach and processes carried out in implementation of the provisions of the Data Governance Regulation (DGA) to facilitate and encourage the use for scientific research purposes of data held by public sector bodies in the public interest. Likewise, in order to encourage the re-use of specific categories of data held by public sector bodies, the Single National Information Point has been set up at datos.gob.es and managed by the Directorate General for Data.
The aim is to contribute to the advancement of scientific research in our country, while protecting the confidentiality of sensitive data. Safe Rooms are an important resource for the re-use of protected data held by the public sector. They enable controlled processing of information, preserve privacy and other data rights, while facilitating compliance with the European Data Governance Regulation.
The European Union has devised a fundamental strategy to ensure accessible and reusable data for research, innovation and entrepreneurship. Strategic decisions have been made both in a regulatory and in a material sense to build spaces for data sharing and to foster the emergence of intermediaries with the capacity to process information.
European policies give rise to a very diverse ecosystem that should be differentiated. On the one hand, there is a deepening of open data reuse policies. On the other hand, the aim is to cover a space that has been inaccessible until now. We are referring to data that, due to the guarantee of the fundamental right to data protection, intellectual property or business secrecy, was inaccessible. Today, anonymization technologies, as well as data intermediation technologies, make it possible to process them with due guarantees. Finally, the aim is to provide resources through the promotion of data spaces, initiatives that propose federative models, such as Gaia X, or the European digital infrastructures (EDIC) promoted by the European Commission and the Digital Innovation Hubs aimed at promoting business and government in this field. This scenario will boost different types of use in research, invocation and entrepreneurship.
This article focuses on the agreement signed by the National Statistics Institute (INE), the State Tax Administration Agency (AEAT), different Social Security bodies, the State Public Employment Service (SEPE) and the Bank of Spain to boost access to data, which is part of this EU strategy whose principles, rules and conditions must be explained in order to place it in context, underline its importance and understand the implications of the agreement.
Competing by guaranteeing our rights
The EU competes at a structural disadvantage vis-à-vis the US or the People's Republic of China. On the North American side, the development processes of disruptive technologies in the context of the Internet and, particularly, the deployment of search engines, social networks and mobile applications have favoured the birth of a data broking market in which a few companies have an almost monopolistic power over data. The great champions of the digital world manage information on practically all sectors of activity, thanks to a business model based on the capitalisation or commoditisation of our privacy and their entry into sectors such as health or activity bracelets. Every time a user did a search, sent an email, commented on a social network or dictated a message to a mobile phone, it fuelled that position of dominance and underpinned the development of large language models in artificial intelligence or the deployment of algorithmic tools linked to neuroemotional marketing.
On the Chinese side, there is a closed internet model under state control, with a position of participation and surveillance over the large local multinationals in the sector and a global dominance over 5G network traffic. It is a vigilant state that has become the first power in the deployment of artificial intelligence through video surveillance and facial recognition and has a very clear state policy on the deployment of artificial intelligence (AI), creating advantages to compete in this race.
The EU starts from an apparently disadvantageous position. It is not at all a question of lack of talent or high abilities. Much of the Internet and IT ecosystem has been developed in Europe or by European talent. However, our market has not been able to generate conditions that would allow the emergence of major technological champions capable of supporting the entire value chain, from cloud infrastructures to the availability of large volumes of data that feed this ecosystem. Moreover, the EU adopted an ethical, political and legal commitment to freedoms, equity and democracy. This position, which has operated as a kind of barrier in terms of costs and processes, integrates within it the essential requirements for a democratic, inclusive and liberty-guaranteeing digital transformation.
The Data Governance Act
The legal substratum of data sharing is integrated by a complex modular structure integrating the General Data Protection Regulation (GDPR), the Open Data and Re-use of Public Sector Information Directive, the Data Governance Act (DGA), the Data Act (DA) and, in the immediate future, the artificial Intelligence Act and the European Health Data Space Regulation (EHDS). The rules should facilitate the re-use of data, including those under the scope of data protection, intellectual property and business secrecy. Several factors must operate to make this possible, which are set out below:
- Data sharing from government should grow exponentially and generate a data market that is currently monopolised by foreign companies.
- Digital sovereignty in legal terms will also be a growth driver insofar as it defines market rules based on the philosophy of the European Union centred on the guarantee of fundamental rights. This should have an immediate consequence when defining processes aimed at producing safe and reliable products.
- Digital sovereignty will in turn have important technological consequences. Public data spaces, whether promoted from digital hubs or federations of nodes, such as Gaia X, should make data available to the individual researcher or start-up, including application dashboards and technical support.
- The result of the regulation is to accelerate and increase the possibilities for freeing and sharing data. The EU and the convention under discussion seek to release data subject to trade secrecy, intellectual property or, in particular, the protection of personal data, in a secure manner through intermediation processes in secure data environments. This matter has occupied, among others, the Spanish Data Protection Agency and the European Cybersecurity Agency (ENISA). This implies a commitment to anonymisation and/or quasi-anonymisation environments through technologies such as differential privacy, homomorphic encryption and homomorphic encryption or multi-party computing.
All of this is based on the guarantee of fundamental rights and the empowerment of people. GDPR, DGA, DA and EHDS should make it possible to achieve the dual objective of creating a European market for the free movement and re-use of protected data. This ensures that individuals and organisations can exercise their rights of control and, at the same time, share these rights, while also encouraging data altruism. Moreover, the GDPR, DGA, EHDS and the artificial Intelligence Act define precise limits through prohibitions on use, regulated access conditions and ethically and legally sound design procedures. With an idea that should be considered central, there is a dimension of public or common interest that, beyond the epic battles of COVID, reaches the small but essential aspirations of the individual researcher, the disruptive entrepreneur, the SME trying to improve its value chain or the Administration innovating processes at the service of people.
Spain commits to the digital transformation of data spaces
The 2025 Plan, the Artificial Intelligence Strategy, the efforts of the Next Generation funds through its Strategic Projects for Economic Recovery and Transformation (PERTES in Spanish), the AI Missions and the Digital Bill of Rights exemplify Spain's alignment and leadership in this field. To make these strategies viable, secure data and process environments are essential. Now, the National Health Data Space has been joined by the agreement between the INE, the AEAT, different Social Security bodies, the SEPE and the Bank of Spain. As its explanatory memorandum states, it constitutes a first and encouraging step towards the deployment of DGA in our country.
They understand not only the scientific and business value of the statistical information they handle, but also the significant growth in demand and need for it. On the other hand, they take on a qualitatively relevant issue: the interest derived from the interconnection of datasets from the point of view of the value they bring. They therefore declare their willingness to maximise the added value of their data by allowing cross-referencing or integration when research is carried out for scientific purposes in the public interest.
The keys to the agreement to provide statistical data to researchers for scientific purposes in the public interest
Some of the questions that may arise with regard to this agreement are answered below.
-
How can the data be accessed?
Access to data goes through a cross-information access request that must be individually accepted by each institution. This takes into account certain assessment criteria regarding the nature of the data and the interest of the proposal.
Facilitating this access implies for the signatory institutions an effort of de-identification and cross-checking carried out by each of them directly or through trusted third parties. The result, "depending on the security level of the resulting file", will entail:
- A direct and autonomous access.
- A processing of the data in one of the secure rooms or centres made available by the signatory entities.
Some of the rooms currently available are:
Also noteworthy is the creation of ES_DataLab, which facilitates access to microdata in an environment that guarantees the confidentiality of the information. It allows cross-referencing data from different participating institutions, such as the INE, the AEAT, the Secretary of State for Social Security and Pensions, the Social Security General Treasury (TGSS), the National Social Security Institute (INSS), the Social Marine Institute (ISM), the Social Security IT Management (GISS), the State Public Employment Service and the Bank of Spain.
In implementation of the DGA 's plans, the Single National Information Point" (NSIP), managed by the General Directorate of Data, has been set up, from where citizens, business people or researchers can locate information on protected public sector data. This item is available at datos.gob.es.
-
What data is shared?
The volume and typologies of data they handle are truly significant. The press release presenting the agreement stated that it would be possible to access "the microdata bases owned by the INE, the AEAT, the SS and the BE, with the necessary guarantees of security, statistical secrecy, personal data protection and compliance with current legislation. In addition to statistical databases from its surveys, INE may also provide access to administrative registers, both those compiled or coordinated by INE and those under other ownership but which INE uses to compile its statistics (in the latter case consulting all requests for access to the holders of the corresponding registers)".
-
Who can access the data?
In order to grant access to the data, the confidentiality regime applicable to the data requested and its legal framework, the social interest of the results to be obtained in the research, the profile, trajectory and scientific publications of the principal investigator and associated researchers or the history of research projects of the entity backing the project, among other aspects, shall be taken into account.
One of the issues envisaged by the DGA in this area consists of establishing economic considerations that ensure the sustainability of the system. In any case, the third clause of the agreement provides for the possibility of receiving financial consideration from applicants for the services of preparing and making available the data contained in the databases owned by them, in accordance with the provisions of statistical legislation (Article 21.3 of the Law 12/1989 of 9 May 1989 on the Public Statistical Function - LFEP) and in the regulations governing each institution.
-
What challenges do data access requesters and signatories face?
Regardless of the scientific conditions of the research proposal, it is essential to appeal to the deploying institutions to significantly increase the quality of their data protection and information security compliance processes. But this will not be enough, the deployment of artificial intelligence requires the incorporation of additional processes that we can find in the document of the Conference of Rectors of Spanish Universities CRUE ICT 360º, addressed in 2023 for the assumption of the university. While it is true that the artificial Intelligence Act proposes a scenario of less regulation in basic research, it also requires a high level of ethical deployment. And to do so, it will be essential to apply principles of artificial intelligence ethics, with the model ALTAI (Assessment List for Trustworthy Artificial Intelligence) or an alternative model, and to the Fundamental Rights Impact Analysis (FRAIA). This is without neglecting the high legal requirements for the development of market-oriented systems. Beyond the formal declarations of the Convention, the lessons learned from European projects affirm the need for a procedural framework of evidence-based legal and ethical verification of research projects and the capacities of institutions requesting access to data.
From the point of view of the signatory institutions, in addition to the challenge of the economic sustainability of the model, foreseen and regulated in the agreement, the need for a regulatory investment strategy seems evident. We have no doubt that each data repository and the processes underpinning them have been subject to a data protection impact assessment and security methodologies linked to the National Scheme. Data protection by design and by default or compliance with the recommendations on anonymisation and data space management mentioned above will be further elements considered. This translates into processes, but also into people - chief data officers, data analysts, other mediators such as data protection officers, etc. - together with a high level of security requirements. On the other hand, the duty of transparency vis-à-vis citizens will require efficient channels and a very precise risk management model in the event of a possible mass exercise of a right to object to processing, without prejudice to its feasibility.
Finally, the Spanish Data Protection Agency should approach this process in a proactive and promotional way without renouncing its role as guarantor of fundamental rights, but contributing to the development of functional solutions. This is not just any agreement but an essential test bed for the future of data research in Spain.
In our opinion, the most exciting statement of these institutions consists of understanding the agreement "as the embryo of the future System of access to data for research for scientific purposes of public interest, which must be in accordance with the Spanish and European strategy on data and the legislation on its governance, within a framework of development of public sector data spaces, and respecting in any case the autonomy and the legal regime applicable to the Banco de España".
Content prepared by Ricard Martínez, Director of the Chair of Privacy and Digital Transformation. Professor, Department of Constitutional Law, Universitat de València. The contents and points of view reflected in this publication are the sole responsibility of its author.