Safe rooms in Spain: What kind of data can researchers access?

Fecha de la noticia: 26-08-2024

Computer screen with data

There are a number of data that are very valuable, but which by their nature cannot be opened to the public at large. These are confidential data which are subject to third party rights that prevent them from being made available through open platforms, but which may be essential for research that promotes advances for society as a whole, in fields such as medical diagnosis public policy evaluation, detection or prosecution of criminal offences, etc. 

In order to facilitate the extraction of value from these data, in compliance with the regulations in force and the rights attached to them, researchers have been provided with researchers secure processing environments, known as safe rooms, have been made available to researchers. The aim is to enable researchers to request and subsequently use and integrate the data contained in certain databases held by organisations to carry out scientific work in the public interest

All in a controlled, secure and privacy-preserving manner . Therefore, researchers and institutions having access to the data are obliged to maintain absolute confidentiality and not to disseminate any identifiable information.

In this context, the National Statistics Institute (INE), the State Tax Administration Agency (AEAT), various Social Security bodies, the State Public Employment Service (SEPE) and the Bank of Spain have signed an agreement with the National Statistics Institute (INE) have signed an agreement to boost controlled access to this type of data. The agreement is in line with the European Union's strategy and the Data Governance Regulation, as we explained in this article. One of the advantages of this agreement is that it facilitates the cross-referencing of data from different organisations through Es_Datalab.

Es_Datalab, joint access to multiple databases

ES_DataLab is a restricted microdata laboratory for researchers developing projects for scientific and public interest purposes. Access to the microdata takes place in an environment that guarantees the confidentiality of the information, as it does not allow the direct identification of the units, coming from different databases.

To access this environment, you must make an application as described here and access will only be valid for the specified period of the research. The process is as follows:

  1. The researcher must be recognised as a "research entity".  There is currently a register of entities (universities, research institutes, research departments of public administrations, etc.) which will be expanded as new organisations apply to join.
  2. Once accredited, the entity must apply for access to microdata, which requires the submission of a research proposal .

Through Es_ Datalab, it is possible to access to several microdata, collected at this link. In this sense, ES_Datalab facilitates the cross-referencing of databases of the participating institutions, maximising the value that the data can offer to the development of research.

Below are some examples of the data provided by each of the agencies, either through ES_datalab for cross-checking with other sources, or in their own secure processing environments.

The National Institute of Statistics

It currently makes available microdata relating to INE datasets, including:

  • Results of surveys that collect information on the labour market insertion of university graduates, the wage structure, the active population, living conditions, health in Spain, etc.
  • Statistics on various social and economic aspects, such as marriages or deaths, environmental protection activities, subsidiaries of companies abroad, etc.
  • Censuses, both general population and by economic activity (e.g. agricultural census).

The INE, in turn, has its own secure own secure room which facilitates access to confidential data for statistical analysis for scientific purposes in the public interest.

State Tax Administration Agency

The microdata relating to the databases provided by the AEAT include detailed information on:

  • Data on the main items contained in various forms, such as form 100, relating to the annual personal income tax return, form 576, on vehicle registrations, or form 714, on wealth tax, among others.
  • Foreign trade statistics, with both total data and data segmented by sector of activity.

Also noteworthy is the contribution of the Institute for Fiscal Studies, which draws on data from the State Tax Administration Agency (Agencia Estatal de la Administración Tributaria). Linked to the Ministry of Finance, it has made available to the public a Statistics Area of the Institute for Fiscal Studies (Instituto de Estudios Fiscales) as well as its own secure room. Its databases include, for example, personal income tax samples, household panels, income panels and the Spanish sector economic database (BADESPE). The product description and data request protocol is available here here.

Social Security

The Social Security grants access to microdata referring to databases such as:

  • The Continuous Sample of Working Lives (MCVL), which includes individual, current and historical data on contribution bases, affiliations (working life), pensions, cohabitants, Personal Income Tax (IRPF), etc.
  •  Social Security affiliates with monthly information on labour relations, by dates of registration and deregistration of companies, type of contract, collective, regime, province, etc.
  • Benefits recognised in the previous year, including retirement, permanent disability, temporary disability and childbirth and childcare pensions.
  • Other databases such as various budget settlements, temporary redundancy procedures (ERTE) by COVID-19, medical examinations by theSocial Marine Institute (ISM), and the medical examinations of the Social Marine Institute (ISM) or data on maritime or data on student maritime training .

The social Security secure rooms available in Madrid, Barcelona and Albacete, allow the processing of this and other protected information by offering access to a series of secure workstations with various programmes and programming languages (SAS, STATA, R, Python and LibreOffice). Remote access is also allowed through secure devices (called "bastioned devices") that are distributed to researchers.

Thanks to these data, it has been possible to carry out studies on the impact of the retirement age on mortality o the use of paternity leave in Spain.

Bank of Spain

We can also find in Es_Datalab microdata related to the Bank of Spain and databases such as: 

  • Company databases, containing information on individual companies, consolidated non-financial corporate groups or the structure of corporate groups.
  • Macroeconomic data, such as public sector debt or loans to legal entities.
  • Other data relating to sustainability indicators or the household panel.

BELab is the secure data laboratory managed by the Banco de España, offering on-site (Madrid) and remote access. Its data have enabled the development of projects on the effects of the minimum wage on Spanish companies, technology management in the textile sector and machine learning applied to credit risk, among others. You can find out about all projects here both those that have been completed and those that are still in progress.

Boosting the re-use of data through the Data Governance Regulation

All these measures are part of the harmonised approach and processes carried out in implementation of the provisions of the Data Governance Regulation (DGA) to facilitate and encourage the use for scientific research purposes of data held by public sector bodies in the public interest. Likewise, in order to encourage the re-use of specific categories of data held by public sector bodies, the Single National Information Point has been set up at datos.gob.es and managed by the Directorate General for Data.

The aim is to contribute to the advancement of scientific research in our country, while protecting the confidentiality of sensitive data. Safe Rooms are an important resource for the re-use of protected data held by the public sector. They enable controlled processing of information, preserve privacy and other data rights, while facilitating compliance with the European Data Governance Regulation.