Noticia

The European Data Strategy aims to create a single market where data flows between countries and sectors. In this respect, the public sector holds a large amount of data of value to citizens. Much of this data are made openly available through various open data platforms, but there are also data over which third party rights apply, limiting its openness. These data can also be of great interest for scientific research purposes.

The existence of numerous administrative registers and public databases, as well as the evolution of the technologies that allow their management, have led to the availability of large amounts of information in all areas that can be used for the benefit of society, increasing the demand for access by researchers.

In this regard, on 3 June, the Data Governance Act was published in the Official Journal of the European Union. This Act seeks to encourage data sharing in the EU, promoting the so-called Data Economy. Among other issues, the new act contemplates the need to develop mechanisms that facilitate the reuse of this type of data, over which third party rights apply, with all the legal guarantees.

One of these mechanisms are the so-called Safe Reading Rooms, mentioned during the impact assessment prior to the approval of the Act.

What are Safe Rooms?

Safe Rooms are conceived as a single point of contact to support researchers in the re-use of certain protected categories of data held by the public sector. They allow for a controlled processing of the data, while preserving privacy or other rights attached to the data.

In Europe there are various initiatives of this type, such as the CASD (Centre d’Accès Sécurisé aux Données) and the Health Data Hub in France or the Microdata Research Laboratory in Portugal.  In Spain we also have several organisations that have already made Safe Reading Rooms available to researchers.  Let's look at 3 examples.

3 examples of Safe Rooms for data sharing in Spain

Bank of Spain Data Laboratory (BELab)

The Banco de España facilitates access to high-quality microdata, guaranteeing its confidentiality through Secure Rooms. Some of the data it offers are microdata from individual companies of Fintech entities or from the Financial Skills Survey.

Users can access the information both on-site (in Madrid and Barcelona) and remotely, depending on the degree of sensitivity of the information under study. The on-site lab stations, which are isolated without internet access, use Stata, R, Python and Octave for data processing.

To gain access, researchers must submit their CV and an application form explaining the purpose of the research. This application is assessed by a Research Technical Evaluation Committee. If accepted, a series of rules and restrictions are set (timetable, access without a mobile device, etc.).

To guarantee the proper use of the microdata, BELab prepares and supplies the methodological documentation. In addition, technical experts review the work to ensure compliance with the corresponding confidentiality clauses.

Once the work has been completed, the researcher is obliged to mention the source of the data and send a copy of the study carried out. He/she also undertakes not to make any attempt to re-identify the natural or legal persons linked to the data under study.

Social Security Investigation Chambers

Researchers and academics interested in Social Security databases and microdata have at their disposal three Secure Rooms in Madrid, Barcelona and Albacete, which can only be accessed by authorised personnel, without electronic devices. These rooms are equipped with tools such as SAS, STATA, R, Python and Microsoft Office. Remote access is also allowed through secure devices (called "bastioned devices") that are distributed among researchers.

Some of the data available are the Continuous Sample of Working Lives, the Monthly Affiliation or the ERTEs by COVID-19, among others.

As in the case of the Bank of Spain, the interested party will have to send a request by e-mail to solicitudes.sala-investigacion@seg-social.es. A Committee of Experts will evaluate the request. If approved, the necessary data will be prepared, access to which will be allowed through a private personal folder.

The Committee of Experts will also evaluate the outcome of the research, to ensure regulatory compliance. If everything is correct, the study will be published on the Social Security Data Portal.

National Statistics Institute (INE)

The National Statistical Institute is one of the main publishers of open data in our country, but it also holds sensitive data of value that must be treated with the corresponding confidentiality measures. Access to this information for scientific research purposes follows the protocol foreseen in Regulation (EC) No 223/2009 on European Statistics and in the European Statistics Code of Practice.

This service is intended for researchers working or collaborating in recognised research organisations. The process is similar to the previous cases. An application must be submitted, which will be evaluated by the INE. This request must be as detailed as possible, indicating the variables to be consulted, the geographical-temporal level and the justification of the need for this information. Some of these data may incur costs, as established in the Official State Gazette.

 

These three examples illustrate the importance of Safe Rooms in enabling the reuse of valuable data while guaranteeing the confidentiality and privacy of the information. This allows for more in-depth research, which can generate economic and social good. An intensive use of data allows to boost innovation in public sector performance, facilitating the contrast of ideas, promoting creativity and the maximum use of resources in the general framework of a modern, participative, open and useful public management to solve or improve social problems and challenges.

calendar icon
Noticia

BiodivERsA is a network of organisations focused on research on biodiversity and ecosystems in European countries and territories, to promote their conservation and sustainable management. Among other actions, this network has published a Guidance document on data management, open data, and the production of Data Management Plans in the framework of scientific research.

The document has been developed in the context of Horizon 2020, with the aim of guiding project teams funded through joint transnational research calls in the drafting and development of their Data Management Plan, with a focus on making their data and publications as open as possible.

The report begins with an introduction on the importance of scientific data and its management, the principles of open science, open data and FAIR principles. It then analyses the concepts and needs of data management in the context of this type of internationally funded project, and ends by highlighting a number of interesting tools and resources.

The importance of scientific data and its management

Organising and making data accessible is becoming increasingly important in the world of science, with the aim of improving traceability and encouraging data sharing. This improves the transparency of studies, but also encourages the reuse of data in new research that generates knowledge for the benefit of society.

The guide refers to a survey carried out by CrowdFlower. This study states that, far from what can be imagined, data scientists do not invest most of their time in building algorithms, exploring data or carrying out predictive analysis. On the contrary, the reality is that most of their time is spent on cleaning and organizing the data. Therefore, an improvement in this aspect would mean a great advance in efficiency, resource optimization and cost reduction.

The authors of the guide stress that science today requires more systematic open access to scientific data and, to this end, they analyse concepts such as data sharing, open access or FAIR principles that scientific data must comply with in order to be shared to its full potential.

Keys and benefits of developing a Data Management Plan (DMP)

A Data Management Plan (DMP) is a document that describes the management cycle of the data that will be collected and processed when generating a research project. The report draws on this report to highlight the main benefits of creating and developing a DMP:

  • Increased efficiency during the project
  • Allows data to be collected and stored in a more structured way
  • Prevents or minimises the risk of data loss
  • Allows data to be shared and reused with guarantees
  • Increases the verifiability of the investigation
  • Increases the longevity of the project by making data available even after the project ends

Structure of a Data Management Plan

DMPs are unique: their content, composition and structure can vary greatly as they depend on the project and the data generated. However, to ensure that all aspects are covered, the report proposes a generic structure for a DMP that can be modified or adapted according to the needs of each project. This structure is organized in nine sections, with a series of questions to facilitate its drafting.  Some examples of these questions are included below: 

  1. Data Managers. Who manages the data? Does the research team include a data expert?
  2. Data Identification & Description. What is the purpose of the research? What data is used and in what format? How often is it collected?
  3. Data Organization & Exchange. How is the data managed? Where is it stored? Who has access to it?
  4. Data Storage & Back-up. What is the strategy for data back-up and storage? How frequently do you do your backups?
  5. Data Sharing, Standards & Metadata.  Are you using a file format that is standard to your field? What tools are required to read the data? Is supporting documentation being generated?
  6. Data Restrictions. How open will the data be? Is there a plan to protect or anonymise the data if necessary?
  7. Data Publishing & Licensing. where and how will the data be released, under what licenses?
  8. Data archiving. How will the data be managed when the project ends to ensure its long-term availability? Will it be published with a Digital Object Identifier (DOI)?
  9. Costs. what are the estimated costs of managing the data and how have these costs been accounted for?

The report also makes a number of general and practical recommendations which apply to all types of projects and their management plans, such as that free and easily accessible Open Science tools should be used as far as possible or that data generated by the project should be posted on a single website.

Tools and resources

The report ends with a series of tools and resources, such as data repositories or the most important biodiversity data standards. 

Data repositories

Funded research projects must store and make available their project data to other users through the main national and international archives and storage services. 

The report divides the repositories into two main groups. On the one hand, the general repositories, which are open to all research fields, and on the other hand, the specific repositories, which focus on specific subjects. The following table shows some examples of general research repositories.

You can access these repositories through the following links:

The report also shows examples of specific repositories in the area of biodiversity such as Arctic   Biodiversity   Data   Service  or Dynamic Ecological Information Management System.

Standards and licences

The use of norms and standards by the biodiversity research community greatly enhances the interoperability of published datasets.  Some of the main ones can be found on this website.

Licensing, on the other hand, boosts visibility, especially when using attribution licenses. In this regard, the report provides a list of resources to better understand and address licensing and other publication considerations.

As a final conclusion we can establish the capital importance of carrying out a data management plan in the scientific field and its subsequent storage in the corresponding repositories in order to promote the reuse of this information and with it the development of new research that promotes the knowledge of humanity. This report offers the keys needed to develop it step by step in a practical and simple way.

calendar icon