Noticia

Researchers and students from various centers have also reported advances resulting from working with data:The last days of the year are always a good time to look back and assess the progress made. If a few weeks ago we took stock of what happened in the Aporta initiative, now it is time to compile the news related to data sharing, open data and the technologies linked to them.

Six months ago, we already made a first collection of milestones in the sector. On this occasion, we will summarise some of the innovations, improvements and achievements of the last half of the year.

Regulating and driving artificial intelligence

La inteligencia artificial (IA) continúa siendo uno de los campos donde cada día se aprecian nuevos avances. Se trata de un sector cuyo auge es relativamente nuevo y que necesita regulación. Por ello, la Unión Europea publicó el pasado julio el Reglamento de inteligencia artificial, una norma que marcará el entorno regulatorio europeo y global. Alineada con Europa, España ya presentó unos meses antes su nueva Estrategia de inteligencia artificial 2024, con el fin de establecer un marco para acelerar el desarrollo y expansión de la IA en España.

Artificial intelligence (AI) continues to be one of the fields where new advances are being made every day. This is a relatively new and booming sector in need of regulation. Therefore, last July, the European Union published the Artificial Intelligence Regulation, a standard that will shape the European and global regulatory environment. Aligned with Europe, Spain had already presented its new Artificial Intelligence Strategy 2024 a few months earlier, with the aim of establishing a framework to accelerate the development and expansion of AI in Spain.

On the other hand, in October, Spain took over the co-presidency of the Open Government Partnership. Its roadmap includes promoting innovative ideas, taking advantage of the opportunities offered by open data and artificial intelligence. As part of the position, Spain will host the next OGP World Summit in Vitoria.

Innovative new data-driven tools

Data drives a host of disruptive technological tools that can generate benefits for all citizens. Some of those launched by public bodies in recent months include:

  • The Ministry of Transport and Sustainable Mobility has started to use Big Data technology to analyse road traffic and improve investments and road safety.
  • The Principality of Asturias announces a plan to use Artificial Intelligence to end traffic jams during the summer, through the development of a digital twin.
  • The Government of Aragon presented a new tourism intelligence system, which uses Big Data and AI to improve decision-making in the sector.
  • The Region of Murcia has launched “Murcia Business Insight”, a business intelligence application that allows dynamic analysis of data on the region's companies: turnover, employment, location, sector of activity, etc.
  • The Granada City Council has used Artificial Intelligence to improve sewerage. The aim is to achieve "more efficient" maintenance planning and execution, with on-site data.
  • The Segovia City Council and Visa have signed a collaboration agreement to develop an online tool with real, aggregated and anonymous data on the spending patterns of foreign Visa cardholders in the capital. This initiative will provide relevant information to help tailor strategies to promote international tourism.

Researchers and students from various centers have also reported advances resulting from working with data:

  • Researchers from the Center for Genomic Regulation (CRG) in Barcelona, the University of the Basque Country (UPV/EHU), the Donostia International Physics Center (DIPC) and the Fundación Biofísica Bizkaia have trained an algorithm to detect tissue alterations in the early stages and improve cancer diagnosis.
  • Researchers from the Spanish National Research Council (CSIC) and KIDO Dynamics have launched a project to extract metadata from mobile antennas to understand the flow of people in natural landscapes. The objective is to identify and monitor the impact of tourism.
  • A student at the University of Valladolid (UVa) has designed a project to improve the management and analysis of forest ecosystems in Spain at the local level, by converting municipal boundaries into a linked open data format. The results are available for re-use.

Advances in data spaces

The Ministry for Digital Transformation and the Civil Service and, specifically, the Secretariat of State for Digitalisation and Artificial Intelligence continues to make progress in the implementation of data spaces, through various actions:

  • A Plan for the Promotion of Sectoral Data Spaces has been presented to promote secure data sharing.
  • The development of Data Spaces for Intelligent Urban Infrastructures (EDINT) has been launched. This project, which will be carried out through the Spanish Federation of Municipalities and Provinces (FEMP), contemplates the creation of a multi-sectoral data space that will bring together all the information collected by local entities.
  • In the field of digitalisation, aid has been launched for the digital transformation of strategic productive sectors through the development of technological products and services for data spaces.

Functionalities that bring data closer to reusers

The open data platforms of the various agencies have also introduced new developments, as new datasets, functionalities, strategies or reports:

  • The Ministry for Ecological Transition and the Demographic Challenge has launched a new application for viewing the National Air Quality Index (AQI) in real time. It includes health recommendations for the general population and the sensitive population.
  • The Andalusian Government has published a "Guide for the design of Public Policy Pilot Studies". It proposes a methodology for designing pilot studies and a system for collecting evidence for decision-making.
  • The Government of Catalonia has initiated steps to implement a new data governance model that will improve relations with citizens and companies.
  • The Madrid City Council is implementing a new 3D cartography and thermal map. In the Blog IDEE (Spatial Data Infrastructure of Spain) they explain how this 3D model of the capital was created using various data capture technologies.
  • The Canary Islands Statistics Institute (ISTAC) has published 6,527 thematic maps with labor indicators on the Canary Islands in its open data catalog.
  • Open Data Initiative and the Democratic Union of Pensioners and Retirees of Spain, with support from the Ministry of Social Rights, Consumption and Agenda 2030, presented the first Data website of the Data Observatory x Seniors. Its aim is to facilitate the analysis of healthy ageing in Spain and strategic decision-making. The Barcelona Initiative also launched a challenge to identify 50 datasets related to healthy ageing, a project supported by the Barcelona Provincial Council.
  • The Centre for Technological Development and Innovation (CDTI) has presented a dashboard in beta phase with open data in exploitable format.

In addition, work continues to promote the opening up of data from various institutions:

  • Asedie and the King Juan Carlos University (Madrid) have launched the Open Data Reuse Observatory to promote the reuse of open data. It already has the commitment of the Madrid City Council and they are looking for more institutions to join their Manifesto.
  • The Cabildo of Tenerife and the University of La Laguna have developed a Sustainable Mobility Strategy in the Macizo de Anaga Biosphere Reserve. The aim is to obtain real-time data in order to take measures adapted to demand.

Data competitions and events to encourage the use of open data

Summer was the time chosen by various public bodies to launch competitions for products and/or services based on open data. This is the case of:

  • The Community of Madrid held DATAMAD 2024 at the Universidad Rey Juan Carlos de Madrid. The event included a workshop on how to reuse open data and a datathon.
  • More than 200 students registered for the I Malackathon, organised by the University of Malaga, a competition that awarded projects that used open data to propose solutions for water resource management.
  • The Junta de Castilla y León held the VIII Open Data Competition, whose winners were announced in November.
  • The II UniversiData Datathon was also launched. 16 finalists have been selected. The winners will be announced on 13 February 2025.
  • The Cabildo of Tenerife also organised its I Open Data Competition: Ideas for reuse. They are currently evaluating the applications received. They will later launch their 2nd Open Data Competition: APP development.
  • The Government of Euskadi held its V Open Data Competition. The finalists in both the Applications and Ideas categories are now known.

Also in these months there have been multiple events, which can be seen online, such as:

Other examples of events that were held but are not available online are the III Congress & XIV Conference of R Users, the Novagob 2024 Public Innovation Congress, DATAGRI 2024 or the Data Governance for Local Entities Conference, among others.

These are just a few examples of the activity carried out during the last six months in the Spanish data ecosystem. We encourage you to share other experiences you know of in the comments or via our email address dinamizacion@datos.gob.es.​

calendar icon
Noticia

TheMinistry for the Digital Transformation and Civil Service has presented an ambitious Plan for the Promotion of Sectorial Data Spaces. Its objective is to foster innovation and improve competitiveness and added-value in all economic sectors, promoting the deployment of data spaces where data can be securely shared. Thanks to them, companies, and the economy in general, will be able to benefit from the full potential of a European data single market.

The Plan has a 500 million euros budget from the Recovery, Transformation and Resilience Plan, and will be developed in 6 axes and 11 initiatives with a planned duration until 2026.

Data spaces

Data sharing in data spaces offers enormous benefits to all the participating companies, both individually and collectively. These benefits include improved efficiency, cost reduction, increased competitiveness, innovation in business models and better adaptation to regulations. These benefits cannot be achieved by companies in isolation but requires the sharing of data among all the actors involved.

Some examples of these benefits would be:

 

Figure 1: impact of data spaces on various sectors. Tourism: capacity planning, marketing optimization, improved tourism experience; environment: product traceability, carbon footprint measurement; energy: energy consumption optimization, demand prediction and production adjustment; media: copyright protection, fake news, content personalization; health: collaborative research, epidemic monitoring, patient´s information sharing; sustainable mobility: route optimization, multimodal transport, supply-demand alignment; agri-food: farm productivity improvement, water resource optimization, collective purchasing; manufacturing: supply chain optimization, predictive maintenance, collaborative project planning.

Figure 1. Impact of data spaces on various sectors.

Some specific initiatives include: 

  • The AgriDataSpace project ensures food quality and safety through full traceability of products.
  • The Mobility Data Space project improves urban planning and transportation efficiency by integrating mobility data.

Benefits of the Plan for the Promotion of Sectorial Data Spaces

The Plan will offer more than €287 million in grants for the creation and maintenance of data spaces, the development of high-value use cases and the reduction of costs for participating companies when consuming, sharing or providing data. It will also offer up to 44 million euros in grants to the technology industry to facilitate the adaptation of their digital products and services to the needs of data spaces and the entities that participate in them by sharing data and making our industry more competitive in data technologies.

Finally, with a budget of up to 169 million euros, several unique projects of public interest will be developed that will act as enablers for digital transformation focused on data and data spaces in all economic sectors. These enablers will contribute to accelerate the process of deploying use cases and data spaces, as well as stimulate companies to actively share data and obtain the expected benefits. To this end, a network of common infrastructures and data space demonstrators will be developed, a National Reference Center for data spaces will be set up, and the entire non-open public data sets held by public administrations which are of high interest to businesses will be made available to the economic sectors.

Learn more about the Plan and its measures

1. Plan

Press dossier

The set of initiatives to be developed by the Plan is summarized in the following table:

Axis 1. #01. Demostrators and use cases. Budget in millions of euros: 110. #2 Use cases for the tourism sector. Budget in millions of euros: 50. Total budget Axis 1 in millions of euros: 160 (32% of the total). Axis 2. #03. Data Space Kit. Total budget Axis 2 in millions of euros: 127 (25% of the total). Axis 3. #04. Technological products and services for Data Spaces. Budget in millions of euros: 44. Total budget Axis 3 in millions of euros: 44 (9% of the total). Axis 4. #05. Public data demand management. Budget in millions of euros: 20. Total budget Axis 4 in millions of euros: 20 (4% of the total). Axis 5. #01. Demonstrators and use cases. Budget in millions of euros: 40. #06 Tourism Data Space Platform. Budget in millions of euros: 35. #07. New Language Economy Data Space. Budget in millions of euros: 12. #08 Smart Urban Infrastructures Data Space. Budget in millions of euros: 13. #09 Regional Development Data Space. Budget in millions of euros: 39. Total budget Axis 5 in millions of euros: 139 (28% of the total). Axis 6. #10. Communication and awareness. Budget in millions of euros: 5. Total budget Axis 6 in millions of euros: 5 (1% of the total). Axis #11. Reference Center for Sectoral Data Space. Budget in millions of euros: 5. Total budget Axis - in millions of euros: 5 (1% of the total).

Figure 2. Summary table with the initiatives included in the Plan for the Promotion of Sectorial Data Spaces.

Discover the grants that are currently active, and the planned schedule to benefit from them:

•	2nd Call for Demonstrators and Use Cases. 65 M.€. Call for proposals December 24.Products and services. 44M.€. Call for proposals December 24.Data Spaces Kit. 127M.€. In progress, expected January 25.

More information about data spaces here.

Links of interest

Spain digital 2026 logo       Recovery, transformation and resilience logo      datos.gob.es logo

logos of Government of Spain, Ministry for the Digital Transformation and Civil Service, funded by the European Union (NextGenerationEU)

calendar icon
Noticia

 Tourism is one of Spain's economic engines. In 2022, it accounted for 11.6% of Gross Domestic Product (GDP), exceeding €155 billion, according to the Instituto Nacional de Estadística (INE). A figure that grew to 188,000 million and 12.8% of GDP in 2023, according to Exceltur, an association of companies in the sector. In addition, Spain is a very popular destination for foreigners, ranking second in the world and growing: by 2024 it is expected to reach a record number of international visitors, reaching 95 million.

In this context, the Secretariat of State for Tourism (SETUR), in line with European policies, is developing actions aimed at creating new technological tools for the Network of Smart Tourist Destinations, through SEGITTUR (Sociedad Mercantil Estatal para la Gestión de la Innovación y las Tecnologías Turísticas), the body in charge of promoting innovation (R&D&I) in this industry. It does this by working with both the public and private sectors, promoting:

  • Sustainable and more competitive management models.
  • The management and creation of smart destinations.
  • The export of Spanish technology to the rest of the world.

These are all activities where data - and the knowledge that can be extracted from it - play a major role. In this post, we will review some of the actions SEGITTUR is carrying out to promote data sharing and openness, as well as its reuse. The aim is to assist not only in decision-making, but also in the development of innovative products and services that will continue to position our country at the forefront of world tourism.

Dataestur, an open data portal

Dataestur is a web space that gathers in a unique environment open data on national tourism. Users can find figures from a variety of public and private information sources.

The data are structured in six categories:

  • General: international tourist arrivals, tourism expenditure, resident tourism survey, world tourism barometer, broadband coverage data, etc.
  • Economy: tourism revenues, contribution to GDP, tourism employment (job seekers, unemployment and contracts), etc.
  • Transport: air passengers, scheduled air capacity, passenger traffic by ports, rail and road, etc.
  • Accommodation: hotel occupancy, accommodation prices and profitability indicators for the hotel sector, etc.
  • Sustainability: air quality, nature protection, climate values, water quality in bathing areas, etc.
  • Knowledge: active listening reports, visitor behaviour and perception, scientific tourism journals, etc.

The data is available for download via API.

Dataestur is part of a more ambitious project in which data analysis is the basis for improving tourist knowledge, through actions with a wide scope, such as those we will see below.

Developing an Intelligent Destination Platform (IDP)

Within the fulfillment of the milestones set by the Next Generation funds, and corresponding to the development of the Digital Transformation Plan for Tourist Destinations, the Secretary of State for Tourism, through SEGITTUR, is developing an Intelligent Destination Platform (PID). It is a platform-node that brings together the supply of tourism services and facilitates the interoperability of public and private operators. Thanks to this platform it will be possible to provide services to integrate and link data from both public and private sources.

Some of the challenges of the Spanish tourism ecosystem to which the IDP responds are:

  • Encourage the integration and development of the tourism ecosystem (academia, entrepreneurs, business, etc.) around data intelligence and ensure technological alignment, interoperability and common language.
  • To promote the use of the data economy to improve the generation, aggregation and sharing of knowledge in the Spanish tourism sector, driving its digital transformation.
  • To contribute to the correct management of tourist flows and tourist hotspots in the citizen space, improving the response to citizens' problems and offering real-time information for tourist management.
  • Generate a notable impact on tourists, residents and companies, as well as other agents, enhancing the brand "sustainable tourism country" throughout the travel cycle (before, during and after).
  • Establish a reference framework to agree on targets and metrics to drive sustainability and carbon footprint reduction in the tourism industry, promoting sustainable practices and the integration of clean technologies.

Objectives of the Intelligent Destination Platform (IDP), mentioned above

Figure 1. Objectives of the Intelligent Destination Platform (IDP).

New use cases and methodologies to implement them

To further harmonise data management, up to 25 use cases have been defined that enable different industry verticals to work in a coordinated manner. These verticals include areas such as wine tourism, thermal tourism, beach management, data provider hotels, impact indicators, cruises, sports tourism, etc.

To implement these use cases, a 5-step methodology is followed that seeks to align industry practices with a more structured approach to data:

  1. Identify the public problems to be solved.
  2. Identify what data are needed to be available to be able to solve them.
  3. Modelling these data to define a common nomenclature, definition and relationships.
  4. Define what technology needs to be deployed to be able to capture or generate such data.
  5. Analyse what intervention capacities, both public and private, are needed to solve the problem.

Boosting interoperability through a common ontology and data space

As a result of this definition of the 25 use cases, a ontology of tourism has been created, which they hope will serve as a global reference. The ontology is intended to have a significant impact on the tourism sector, offering a series of benefits:

  • Interoperability: The ontology is essential to establish a homogeneous data structure and enable global interoperability, facilitating information integration and data exchange between platforms and countries. By providing a common language, definitions and a unified conceptual structure, data can be comparable and usable anywhere in the world. Tourism destinations and the business community can communicate more effectively and agilely, fostering closer collaboration.
  • Digital transformation: By fostering the development of advanced technologies, such as artificial intelligence, tourism companies, the innovation ecosystem or academia can analyse large volumes of data more efficiently. This is mainly due to the quality of the information available and the systems' better understanding of the context in which they operate.
  • Tourism competitiveness: Aligned with the previous question, the implementation of this ontology contributes to eliminating inequalities in the use and application of technology within the sector. By facilitating access to advanced digital tools, both public institutions and private companies can make more informed and strategic decisions. This not only raises the quality of the services offered, but also boosts the productivity and competitiveness of the Spanish tourism sector in an increasingly demanding global market.
  • Tourist experience: Thanks to ontology, it is possible to offer recommendations tailored to the individual preferences of each traveller. This is achieved through more accurate profiling based on demographic and behavioural characteristics as well as specific motivations related to different types of tourism. By personalising offers and services, customer satisfaction before, during and after the trip is improved, and greater loyalty to tourist destinations is fostered.
  • Governance: The ontology model is designed to evolve and adapt as new use cases emerge in response to changing market demands. SEGITTUR is actively working to establish a governance model that promotes effective collaboration between public and private institutions, as well as with the technology sector.

In addition, to solve complex problems that require the sharing of data from different sources, the Open Innovation Platform (PIA) has been created, a data space that facilitates collaboration between the different actors in the tourism ecosystem, both public and private. This platform enables secure and efficient data sharing, empowering data-driven decision making. The PIA promotes a collaborative environment where open and private data is shared to create joint solutions to address specific industry challenges, such as sustainability, personalisation of the tourism experience or environmental impact management.

Building consensus

SEGITTUR is also carrying out various initiatives to achieve the necessary consensus in the collection, management and analysis of tourism-related data, through collaboration between public and private actors. To this end, the Ente Promotor de la Plataforma Inteligente de Destinoswas created in 2021, which plays a fundamental role in bringing together different actors to coordinate efforts and agree on broad lines and guidelines in the field of tourism data.

In summary, Spain is making progress in the collection, management and analysis of tourism data through coordination between public and private actors, using advanced methodologies and tools such as the creation of ontologies, use cases and collaborative platforms such as PIA that ensure efficient and consensual management of the sector.

All this is not only modernising the Spanish tourism sector, but also laying the foundations for a smarter, more intelligent, connected and efficient future. With its focus on interoperability, digital transformation and personalisation of experiences, Spain is positioned as a leader in tourism innovation, ready to face the technological challenges of tomorrow.

calendar icon
Noticia

The Ministry for Digital Transformation and Public Administration has launched a grant for the development of Data Spaces for Intelligent Urban Infrastructures (EDINT). This project envisages the creation of a multi-sectoral data space that will bring together all the information collected by local authorities. The project will be carried out through the Spanish Federation of Municipalities and Provinces (FEMP) and will receive a subsidy of 13 million euros, as stated in the Official State Gazette published on Wednesday 16 October.

A single point of access to smart urban infrastructure data

Thanks to this action, it will be possible to finance, develop and manage a multisectoral data space that will bring together all the information collected by the different Spanish municipalities in an aggregated and centralized manner. It should be recalled that data spaces enable the voluntary sharing of information in an environment of sovereignty, trust and security, established through integrated governance, organisational, regulatory and technical mechanisms.

EDINT will act as a single neutral point of access to smart city information, enabling companies, researchers and administrations to access information without the need to visit the data infrastructure of each municipality, increasing agility and reducing costs. In addition, it will allow connection with other sectoral data spaces.

The sharing of this data will help to accelerate technological innovation processes in smart city products and services. Businesses and organisations will also be able to use the data for the improvement of processes and efficiency of their activities.

The Spanish Federation of Municipalities and Provinces (FEMP) will implement the project.

The EDINT project will be articulated through the Spanish Federation of Municipalities and Provinces.The FEMP reaches more than 95% of the Spanish population, which gives it a deep and close knowledge of the needs and challenges of data management in Spanish municipalities and provinces.

Among the actions to be carried out are:

  • Development and implementation of the data infrastructure and platform, which will store data from existing Smart City systems.
  • Incorporation of local entities and companies interested in accessing the data space.
  • Development of three use cases on the data space, focusing on the following areas: "smart mobility", "managed cities and territories" and "mapping the economic and social activity of cities and territories".
  • Definition of the governance schemes that will regulate the operation of the project, guaranteeing the interoperability of the data, as well as the management of the complex network of stakeholders (companies, academic institutions and governmental organisations).
  • Setting up Centres of Excellence and Data Offices, with physical workspaces. These centres will be responsible for the collection of lessons learned and the development of new use cases.

It is a ongoing and sustainable long-term project that will be open to the participation of new actors, be they data providers or data consumers, at any time.

A project aligned with Europe

This assistance is part of the Recovery, Transformation and Resilience Plan, funded by the European Union-Next Generation EU. The creation of data spaces is envisaged in the European Data Strategy, as a mechanism to establish a common data market to ensure the European Union's leadership in the global data economy. In particular, it aims to achieve the free flow of information for the benefit of businesses, researchers and public administrations.

Moreover, data spaces are a key area of the Digital Spain 2026 Agenda, which is driving, among other issues, the acceleration of the digitalisation processes of the productive fabric. To this end, sectoral and data-intensive digitalisation projects are being developed, especially in strategic economic sectors for the country, such as agri-food, mobility, health, tourism, industry, commerce and energy.

The launch of the EDINT project joins other previously launched initiatives such as funding and development grants for use cases and data space demonstrators, which encourage the promotion of public-private sectoral innovation ecosystems.

Sharing data under conditions of sovereignty, control and security not only allows local governments to improve efficiency and decision-making, but also drives the creation of creative solutions to various urban challenges, such as optimising traffic or improving public services. In this sense, actions such as the Data Spaces for Smart Urban Infrastructures represent a step forward in achieving smarter, more sustainable and efficient cities for all citizens.

calendar icon
Blog

The strong commitment to common data spaces at European level is one of the main axes of the European Data Strategy adopted in 2020. This approach was already announced in that document as a basis, on the one hand, to support public policy momentum and, on the other hand, to facilitate the development of innovative products and services based on data intelligence and machine learning.

However, the availability of large sectoral datasets required, as an unavoidable prerequisite, an appropriate cross-cutting regulatory framework to establish the conditions for feasibility and security from a legal perspective. In this regard, once the reform of the regulation on the re-use of public sector information had been consolidated, with major innovations such as high-value data, the regulation on data governance was approved in 2022 and then, in 2023, the so-called Data Act. With these initiatives already approved and the recent official publication of the Artificial Intelligence Regulation, the promotion of data spaces is of particular importance, especially in the public sector, in order to ensure the availability of sufficient and quality data.

Data spaces: diversity in their configuration and regulation

The European Data Strategy already envisaged the creation of common European data spaces in a number of sectors and areas of public interest, but at the same time did not rule out the launching of new ones. In fact, in recent years, new spaces have been announced, so that the current number has increased significantly, as we shall see below.

The main reason for data spaces is to facilitate the sharing and exchange of reliable and secure data in strategic economic sectors and areas of public interest. Thus, it is not simply a matter of promoting large datasets but, above all, of supporting initiatives that offer data accessibility according to suitable governance models that, ultimately, allow the interoperability of data throughout the European Union on the basis of appropriate technological infrastructures.

Although general characterisations of data spaces can be offered on the basis of a number of common notes, there is a great diversity from a legal perspective in terms of the purposes they pursue, the conditions under which data are shared and, in particular, the subjects involved.

This heterogeneity is also present in spaces related to the public sector, i.e. those in which there is a prominent role for data generated by administrations and other public entities in the exercise of their functions, to which, therefore, the regulation on reuse and open data approved in 2019 is fully applicable.

Which are the European public sector data spaces?

In early 2024, the second version of a European Commission working document was published with the dual objective of providing an updated overview of the European policy framework for data spaces and also identifying European data space initiatives to assess their maturity and the main challenges ahead for each of them.

In particular, as far as public administrations are concerned, four data spaces are envisaged: the legal data space, the public procurement data space, the data space linked to the technical "once only" system in the context of eGovernment and, finally, the security data space for innovation. These are very diverse initiatives which, moreover, present an uneven degree of maturity, so that some have an advanced level of development and solid institutional support, while other cases are only initially sketched out and have considerable effort ahead for their design and implementation.

Let us take a closer look at each of these spaces referred to in the working paper.

1. Legal data space

It is a data space linked to legislation and jurisprudence generated by both the European Union and the Member States. The aim of this initiative is to support the legal professions, public administrations and, in general, to facilitate access to society in order to strengthen the mechanisms of the rule of law. This space has so far been based on two specific initiatives:

  • One concerning information on officially published legislation, which has been articulated through the European Legislation Identifier-ELI. It is a European standard that facilitates the identification of rules in a stable and easily reusable way as it describes legislation with a set of automatically processable metadata, according to a recommended ontology.
  • The second concerns decisions taken by judicial bodies, which are made accessible through an European system of unique identifiers called ECLI (European Case Law Identifier) that is assigned to the decisions of both European and national judicial bodies.

These two important initiatives, which facilitate access to and automated processing of legal information, have required a shift from a document-based management model (official gazette, court decisions) to a data-based model. And it is precisely this paradigm shift that has made it possible to offer advanced information services that go beyond the legal and linguistic limits posed by regulatory and linguistic diversity across the European Union.

In any case, while recognising the important progress they represent, there are still important challenges to be faced, such as facilitating access by specific precepts and not by normative documents or, among others, the availability of judicial documents on the basis of the rules they apply and, also, the linking of the rules with their judicial interpretation by the various judicial bodies in all States. In the case of the latter two scenarios, the challenge is even greater, as they would require the automated linking of both identifiers.

2. Public procurement data space

This is undoubtedly one of the areas with the greatest potential impact, given that in the European Union as a whole, it is estimated that public entities spend around two trillion euros (almost 14% of GDP) on the purchase of services, works and supplies. This space is therefore intended not only to facilitate access to the public procurement market across the European Union but also to strengthen transparency and accountability in public procurement spending, which is essential in the fight against corruption and in improving efficiency.

The practical relevance of this space is reinforced by the fact that it has a specific official document that strongly supports the project and sets out a precise roadmap with the objective of ensuring its deployment within a reasonable timeframe. Moreover, despite limitations in its scope of application (there is no provision for extending the publication obligation to contracts below the thresholds set at European level, nor for contract completion notices), it is at a very advanced stage, in particular as regards the availability of a specific ontology which facilitates the accessibility of information and its re-use by reinforcing the conditions for interoperability.

In short, this space is facilitating the automated processing of public procurement data by interconnecting existing datasets, thus providing a more complete picture of public procurement in the European Union as a whole, even though it has been estimated that there are more than 250,000 contracting authorities awarding public contracts.

3. Single Technical System (e-Government)

This new space is intended to support the need that exists in administrative procedures to collect information issued by the administrations of other States, without the interested parties being required to do so directly. It is therefore a matter of automatically and securely gathering the required evidence in a formalised environment based on the direct interconnection between the various public bodies, which will thus act as authentic sources of the required information.

This initiative is linked to the objective of addressing administrative simplification and, in particular, to the implementation of:

4. Security data space for innovation

The objective here is to improve law enforcement authorities' access to the data needed to train and validate algorithms with the aim of enhancing the use of artificial intelligence and thus strengthening law enforcement in full respect of ethical and legal standards.

While there is a clear need to facilitate the exchange of data between Member States' law enforcement authorities, the working paper emphasises that this is not a priority for AI strategies in this area, and that the advanced use of data in this area from an innovation perspective is currently relatively low.

In this respect, it is appropriate to highlight the initiative for the development of the Europol sandbox, a project that was sponsored by the decision of the Standing Committee on Operational Cooperation on Internal Security (COSI) to create an isolated space that allows States to develop, train and validate artificial intelligence and machine learning models.

Now that the process of digitisation of public entities is largely consolidated, the main challenge for data spaces in this area is to provide adequate technical, legal and organisational conditions to facilitate data availability and interoperability. In this sense, these data spaces should be taken into account when expanding the list of high-value data, along the lines already advanced by the study published by the European Commission in 2023, which emphasises that the data ets with the greatest potential are those related to government and public administration, justice and legal matters, as well as financial data.


Content prepared by Julián Valero, Professor at the University of Murcia and Coordinator of the "Innovation, Law and Technology" Research Group (iDerTec). The contents and points of view reflected in this publication are the sole responsibility of the author.

calendar icon
Blog

Data sandboxes are tools that provide us with environments to test new data-related practices and technologies, making them powerful instruments for managing and using data securely and effectively. These spaces are very useful in determining whether and under what conditions it is feasible to open the data. Some of the benefits they offer are:

  • Controlled and secure environments: provide a workspace where information can be explored and its usefulness and quality assessed before committing to wider sharing. This is particularly important in sensitive sectors, where privacy and data security are paramount.
  • Innovation: they provide a safe space for experimentation and rapid prototyping, allowing for rapid iteration, testing and refining new ideas and data-driven solutions as test bench before launching them to the public.
  • Multi-sectoral collaboration: facilitate collaboration between diverse actors, including government entities, private companies, academia and civil society. This multi-sectoral approach helps to break down data silos and promotes the sharing of knowledge and good practices across sectors.
  • Adaptive and scalable use: they can be adjusted to suit different data types, use cases and sectors, making them a versatile tool for a variety of data-driven initiatives.
  • Cross-border data exchange: they provide a viable solution to manage the challenges of data exchange between different jurisdictions, especially with regard to international privacy regulations.

The report "Data Sandboxes: Managing the Open Data Spectrum" explores the concept of data sandboxes as a tool to strike the right balance between the benefits of open data and the need to protect sensitive information.

Value proposition for innovation

In addition to all the benefits outlined above, data sandboxes also offer a strong value proposition for organisations looking to innovate responsibly. These environments help us to improve data quality by making it easier for users to identify inconsistencies so that improvements can be made. They also contribute to reducing risks by providing secure environments to enable work with sensitive data. By fostering cross-disciplinary experimentation, collaboration and innovation, they contribute to increasing the usability of data and developing a data-driven culture within organisations. In addition, data sandboxes help reduce barriers to data access , improving transparency and accountability, which strengthens citizens' trust and leads to an expansion of data exchanges.

Types of data sandboxes and characteristics

Depending on the main objective when implementing a sandbox, there are three different types of sandboxes:

  1. Regulatory sandboxes, which allow companies and organisations to test innovative services under the close supervision of regulators in a specific sector or area.
  2. Innovation sandboxes, which are frequently used by developers to test new features and get quick feedback on their work.
  3. Research sandboxes, which make it easier for academia and industry to safely test new algorithms or models by focusing on the objective of their tests, without having to worry about breaching established regulations.

In any case, regardless of the type of sandbox we are working with, they are all characterised by the following common key aspects:

Characteristics of a data sandbox: adaptable and scalable, controlled, secure, multi-sectoral and collaborative, high computational environment, temporal in nature. Source: adapted from The Govlab.

Figure 1. Characteristics of a data sandbox. Adaptation of a visual of The Govlab.

Each of these is described below:

  1. Controlled: these are restricted environments where sensitive and analysed data can be accessed securely, ensuring compliance with relevant regulations.
  2. Secure: they protect the privacy and security of data, often using anonymised or synthetic data.
  3. Collaborative: facilitating collaboration between different regions, sectors and roles, strengthening data ecosystems.
  4. High computational capacity: provide advanced computational resources capable of performing complex tasks on the data when needed.
  5. Temporal in nature: They are designed for temporary use and with a short life cycle, allowing for rapid and focused experimentation that either concludes once its objective is achieved or becomes a new long-term project.
  6. Adaptable: They are flexible enough to customise and scale according to needs and different data types, use cases and contexts.

Examples of data sandboxes

Data sandboxes have long been successfully implemented in multiple sectors across Europe and around the world, so we can easily find several examples of their implementation on our continent:

  • Data science lab in Denmark: it provides access to sensitive administrative data useful for research, fostering innovation under strict data governance policies.
  • TravelTech in Lithuania: an open access sandbox that provides tourism data to improve business and workforce development in the sector.
  • INDIGO Open Data Sandbox: it promotes data sharing across sectors to improve social policies, with a focus on creating a secure environment for bilateral data sharing initiatives.
  • Health data science sandbox in Denmark: a training platform for researchers to practice data analysis using synthetic biomedical data without having to worry about strict regulation.

Future direction and challenges

As we have seen, data sandboxes can be a powerful tool for fostering open data, innovation and collaboration, while ensuring data privacy and security. By providing a controlled environment for experimentation with data, they enable all interested parties to explore new applications and knowledge in a reliable and safe way. Sandboxes can therefore help overcome initial barriers to data access and contribute to fostering a more informed and purposeful use of data, thus promoting the use of data-driven solutions to public policy problems.

However, despite their many benefits, data sandboxes also present a number of implementation challenges. The main problems we might encounter in implementing them include:

  • Relevance: ensure that the sandbox contains high quality and relevant data, and that it is kept up to date.
  • Governance: establish clear rules and protocols for data access, use and sharing, as well as monitoring and compliance mechanisms.
  • Scalability: successfully export the solutions developed within the sandbox and be able to translate them into practical applications in the real world.
  • Risk management: address comprehensively all risks associated with the re-use of data throughout its lifecycle and without compromising its integrity.

However, as technologies and policies continue to evolve, it is clear that data sandboxes are set to be a useful tool and play an important role in managing the spectrum of data openness, thereby driving the use of data to solve increasingly complex problems. Furthermore, the future of data sandboxes will be influenced by  new regulatory frameworks (such as Data Regulations and Data Governance) that reinforce data security and promote data reuse, and by integration with privacy preservation and privacy enhancing technologies that allow us to use data without exposing any sensitive information. Together, these trends will drive more secure data innovation within the environments provided by data sandboxes.


Content prepared by Carlos Iglesias, Open data Researcher and consultant, World Wide Web Foundation. The contents and views expressed in this publication are the sole responsibility of the author.

calendar icon
Blog

IMPaCT, the Infrastructure for Precision Medicine associated with Science and Technology, is an innovative programme that aims to revolutionise medical care. Coordinated and funded by the Carlos III Health Institute, it aims to boost the effective deployment of personalised precision medicine.

Personalised medicine is a medical approach that recognises that each patient is unique. By analysing the genetic, physiological and lifestyle characteristics of each person, more efficient and safer tailor-made treatments with fewer side effects are developed.  Access to this information is also key to making progress in prevention and early detection, as well as in research and medical advances.

IMPaCT consists of 3 strategic axes:

  • Axis 1 Predictive medicine: COHORTE Programme. It is an epidemiological research project consisting of the development and implementation of a structure for the recruitment of 200,000 people to participate in a prospective study. 
  • Strand 2 Data science: DATA Programme. It is a programme focused on the development of a common, interoperable and integrated system for the collection and analysis of clinical and molecular data. It develops criteria, techniques and best practices for the collection of information from electronic medical records, medical images and genomic data.
  • Axis 3 Genomic medicine: GENOMICS Programme. It is a cooperative infrastructure for the diagnosis of rare and genetic diseases. Among other issues, it develops standardised procedures for the correct development of genomic analyses and the management of the data obtained, as well as for the standardisation and homogenisation of the information and criteria used.

In addition to these axes, there are two transversal strategic lines: one focused on ethics and scientific integrity and the other on internationalisation, as summarised in the following visual.

Pillars of the IMPaCT project, the Infrastructure for Precision Medicine associated with Science and Technology.   Strategic axis 1: Predictive Medicine Strategic axis 2: Data science Strategic line 3: Genomic medicine   Cross-cutting strategic line 1- ethics and scientific integrity   Cross-cutting strategic line 2 - internalization   Source: IMPaCT data.

Source: IMPaCT-Data

In the following, we will focus on the functioning and results of IMPaCT-Data, the project linked to axis 2.

IMPaCT-Data, an integrated environment for interoperable data analysis

IMPaCT-Data is oriented towards the development and validation of an environment for the integration and joint analysis of clinical, molecular and genetic data, for secondary use, with the ultimate goal of facilitating the effective and coordinated implementation of personalised precision medicine in the National Health System. It is currently made up of a consortium of 45 entities associated by an agreement that runs until 31 December 2025.

Through this programme, the aim is to create a cloud infrastructure for medical data for research, as well as the necessary protocols to coordinate, integrate, manage and analyse such data. To this end, a roadmap with the following technical objectives is followed:

Development of the first iteration of a federated biomedical data platform. Development of the first version of a cloud computing infrastructure that can support IMPaCT. Development of integrated data analysis protocols, methods and systems, including FAIRification. Initial development for the monitoring of data quality treatment and evaluation processes. Initial development for (semi)automatic and secure extraction of information from health information systems, including Electronic Health Record (EHR). Incorporation of genetic and genomic information. Leading the portfolio of bioinformatics resources offered by Spain to ELIXIR. Extraction of quantitative information from medical images. Development of prototypes for the integration of genomic analysis, imaging and EHR. Implementation of demonstrators on advanced translational information interoperability functions. Evaluation and concerted implementation of management demonstrators, in collaboration with the TransBioNet network and other health stakeholders.

Source: IMPaCT-Data.

Results of IMPaCT-Data

As we can see, this infrastructure, still under development, will provide a virtual research environment for data analysis through a variety of services and products:

In addition to these, there are a number of deliverables related to technical aspects of the project, such as comparisons of techniques or proofs of concept, as well as scientific publications.

Driving use cases through demonstrators

One of the objectives of IMPaCT-Data is to contribute to the evaluation of technologies associated with the project's developments, through an ecosystem of demonstrators. The aim is to encourage contributions from companies, organisations and academic groups to drive improvements and achieve large-scale implementation of the project.

To meet this objective, different activities are organised where specific components are evaluated in collaboration with members of IMPaCT-Data. One example is the oRBITS terminology server for the encoding of clinical phenotypes into HPO (Human Phenotype Ontology) aimed at automatically extracting and encoding information contained in unstructured clinical reports using natural language processing. It uses the HPO terminology, which aims to standardise the collection of phenotypic data, making it accessible for further analysis.

Another example of demonstrators refers to the sharing of virtualised medical data between different centres for research projects, within a governed, efficient and secure environment, where all data quality standards defined by each entity are met within a governed, efficient and secure environment, where all data quality standards defined by each entity are met.

A strategic project aligned with Europe

IMPaCT-Data fits directly into the National Strategy for the Secondary Use of National Health System Data, as described in the PERTE on health (Strategic Projects for Economic Recovery and Transformation), with its knowledge, experience and input being of great value for the development of the National Health Data Space.

Furthermore, IMPaCT-Data's developments are directly aligned with the guidelines proposed by GAIA-X both at a general level and in the specific health environment.

The impact of the project in Europe is also evidenced by its participation in the european project GDI (Genomic Data Infrastructure) which aims to facilitate access to genomic, phenotypic and clinical data across Europe, where IMPaCT-Data is being used as a tool at national level.

This shows that thanks to IMPaCT-Data it will be possible to promote biomedical research projects not only in Spain, but also in Europe, thus contributing to the improvement of public health and individualised treatment of patients.

calendar icon
Blog

One of the main objectives of Regulation (EU) of the European Parliament and of the Council of 13 December 2023 on harmonised rules for fair access to and use of data (Data Regulation) is to promote the development of interoperability criteria for data spaces, data processing services and smart contracts. In this respect, the Regulation understands interoperability as:

The ability of two or more data spaces or communication networks, systems, connected products, applications, data processing services or components to exchange and use data to perform their functions.

It explicitly states that 'interoperable and high quality data from different domains increase competitiveness and innovationand ensure sustainable economic growth', which requires that 'the same data can be used and reused for different purposes and in an unlimited way, without loss of quality or quantity'. It therefore believes that "a regulatory approach to interoperability that is ambitious and inspires innovation is essential toovercome the dependence on a single provider, which hinders competition and the development of new services".

Interoperability and data spaces

This concern already existed in the European Data Strategy where interoperability was seen as a key element for the valorisation of data and, in particular, for the deployment of Artificial Intelligence. In fact, interoperability is an unavoidable premise for data spaces, so that the establishment of appropriate protocols becomes essential to ensure their potential, both for each of the data spaces internally and also in order to facilitate a cross-cutting integration of several of them.

In this sense, there are frequent standardisation initiatives and meetings to try to establish specific interoperability conditions in this type of scenario, characterised by the diversity of data sources. Although this is an added difficulty, a cross-cutting approach, integrating several data spaces, provides a greater impact on the generation of value-added services and creates the right legal conditions for innovation.

According to the Data Regulation, those who participate in data spaces and offer data or data services to other actors involved in data spaces have to comply with a number of requirements aimed precisely at ensuring appropriate conditions for interoperability and thus that data can be processed jointly. To this end, a description of the content, structure, format and other conditions of use of the data shall be provided in such a way as to facilitate access to and sharing of the data in an automated manner, including in real time or allowing bulk downloading where appropriate.

It should be noted that compliance with technical and semantic standards for interoperability is essential for data spaces, since a minimum standardisation of legal conditions greatly facilitates their operation. In particular, it is of great importance to ensure that the data provider holds the necessary rights to share the data in such an environment and to be able to prove this in an automated way

Interoperability in data processing services

The Data Regulation pays particular attention to the need to improve interoperability between different data processing service providers, so that customers can benefit from the interaction between each of them, thereby reducing dependency on individual providers.

To this end, firstly, it reinforces the reporting obligations of providers of this type of services, to which must be added those derived from the general regulation on the provision of digital content and services general regulation on the provision of digital content and services. In particular, they must be in writing:

  • Contractual conditions relating to customer rights, especially in situations related to a possible switch to another provider or infrastructure.
  • A full indication of the data that may be exported during the switching process, so that the scope of the interoperability obligation will have to be fixed in advance. In addition, such information has to be made available through an up-to-date online registry to be offered by the service provider.

The Regulation aims to ensure that customers' right to free choice of data service provider is not affected by barriers and difficulties arising from lack of interoperability. The regulation even contemplates an obligation of proactivity so that the change of provider takes place without incidents in the provision of the service to the customer, obliging them to adopt reasonable measures to ensure "functional equivalence" and even to offer free of charge open interfaces to facilitate this process. However, in some cases - in particular where two services are intended to be used in parallel - the former provider is allowed to pass on certain costs that may have been incurred.

Ultimately, the interoperability of data processing services goes beyond simple technical or semantic aspects, so that it becomes an unavoidable premise for ensuring the portability of digital assets, guaranteeing the security and integrity of services and, among other objectives, not interfering with the incorporation of technological innovations, all with a marked prominence of cloud services.

Smart contracts and interoperability

The Data Regulation also pays particular attention to the interoperability conditions allowing the automated execution of data exchanges, for which it is essential to set them in a predetermined way. Otherwise, the optimal operating conditions required by the digital environment, especially from the point of view of efficiency, would be affected.

The new regulation includes specific obligations for smart contract providers and also for those who deploy smart contract tools in the course of their commercial, business or professional activity. For this purpose, a smart contract is defined as a contract that

a computer programme used for the automated execution of an agreement or part thereof, which uses a sequence of electronic data records and ensures their completeness and the accuracy of their chronological order

They have to ensure that smart contracts comply with the obligations of the Regulation as regards the provision of data and, among other aspects, it will be essential to ensure "consistency with the terms of the data sharing agreement that executes the smart contract". They shall therefore be responsible for the effective fulfilment of these requirements by carrying out a conformity assessment and issuing a declaration of compliance with these requirements.

To facilitate the enforcement of these safeguards, the Regulation provides for a presumption of compliance where harmonised standards published in the Official Journal of the European Union are respected the Commission is authorised to request European standardisation organisations to draw up specific provisions.

In the last five years, and in particular since the 2020 Strategy, there has been significant progress in European regulation, which makes it possible to state that the right legal conditions are in place to ensure the availability of quality data to drive technological innovation. As far as interoperability is concerned, very important steps have already been taken, especially in the public sector public sector where we can find disruptive technologies that can be extremely useful. However, the challenge of precisely specifying the scope of the legally established obligations still remains.

For this reason, the Data Regulation itself empowers the Commission toadopt common specifications to ensure effective compliance with the measures it envisages if necessary. However, this is a subsidiary measure, as other avenues to achieve interoperability, such as the development of harmonised standards through standardisation organisations, must be pursued first.

In short, regulating interoperability requires an ambitious approach, as recognised by the Data Regulation itself, although it is a complex process that requires implementing measures at different levels that go beyond the simple adoption of legal rules, even if such legislation represents an important step forward to boost innovation under the right conditions, i.e. beyond simple technological premises.


Content prepared by Julián Valero, Professor at the University of Murcia and Coordinator of the Research Group "Innovation, Law and Technology" (iDerTec). The contents and points of view reflected in this publication are the sole responsibility of its author.

calendar icon
Blog

The publication on Friday 12 July 2024 of the Artificial Intelligence Regulation (AIA) opens a new stage in the European and global regulatory framework. The standard is characterised by an attempt to combine two souls. On the one hand, it is about ensuring that technology does not create systemic risks for democracy, the guarantee of our rights and the socio-economic ecosystem as a whole. On the other hand, a targeted approach to product development is sought in order to meet the high standards of reliability, safety and regulatory compliance defined by the European Union.

Scope of application of the standard

The standard allows differentiation between low-and medium-risk systems, high-risk systems and general-purpose AI models. In order to qualify systems, the AIA defines criteria related to the sector regulated by the European Union (Annex I) and defines the content and scope of those systems which by their nature and purpose could generate risks (Annex III). The models are highly dependent on the volume of data, their capacities and operational load. 

 AIA only affects the latter two cases: high-risk systems and general-purpose AI models. High-risk systems require conformity assessment through notified bodies. These are entities to which evidence is submitted that the development complies with the AIA. In this respect, the models are subject to control formulas by the Commission that ensure the prevention of systemic risks. However, this is a flexible regulatory framework that favours research by relaxing its application in experimental environments, as well as through the deployment of sandboxes for development.

The standard sets out a series of "requirements for high-risk AI systems" (section two of chapter three) which should constitute a reference framework for the development of any system and inspire codes of good practice, technical standards and certification schemes. In this respect, Article 10 on "data and data governance" plays a central role. It provides very precise indications on the design conditions for AI systems, particularly when they involve the processing of personal data or when they are projected on natural persons.

This governance should be considered by those providing the basic infrastructure and/or datasets, managing data spaces or so-called Digital Innovation Hubs, offering support services. In our ecosystem, characterised by a high prevalence of SMEs and/or research teams, data governance is projected on the quality, security and reliability of their actions and results. It is therefore necessary to ensure the values that AIA imposes on training, validation and test datasets in high-risk systems, and, where appropriate, when techniques involving the training of AI models are employed.

These values can be aligned with the principles of Article 5 of the General Data Protection Regulation (GDPR) and enrich and complement them. To these are added the risk approach and data protection by design and by default. Relating one to the other is ancertainly interesting exercise.

Ensure the legitimate origin of the data. Loyalty and lawfulness

Alongside the common reference to the value chain associated with data, reference should be made to a 'chain of custody' to ensure the legality of data collection processes. The origin of the data, particularly in the case of personal data, must be lawful, legitimate and its use consistent with the original purpose of its collection. A proper cataloguing of the datasets at source is therefore indispensable to ensure a correct description of their legitimacy and conditions of use.

This is an issue that concerns open data environments, data access bodies and services detailed in the Data Governance Regulation (DGA ) or the European Health Data Space (EHDS) and is sure to inspire future regulations. It is usual to combine external data sources with the information managed by the SME.

Data minimisation, accuracy and purpose limitation

AIA mandates, on the one hand, an assessment of the availability, quantity and adequacy of the required datasets. On the other hand, it requires that the training, validation and test datasets are relevant, sufficiently representative and possess adequate statistical properties. This task is highly relevant to the rights of individuals or groups affected by the system. In addition, they shall, to the greatest extent possible, be error-free and complete in view of their intended purpose. AIA predicates these properties for each dataset individually or for a combination of datasets.

In order to achieve these objectives, it is necessary to ensure that appropriate techniques are deployed:

  • Perform appropriate processing operations for data preparation, such as annotation, tagging, cleansing, updating, enrichment and aggregation.
  • Make assumptions, in particular with regard to the information that the data are supposed to measure and represent. Or, to put it more colloquially, to define use cases.
  • Take into account, to the extent necessary for the intended purpose, the particular characteristics or elements of the specific geographical, contextual, behavioural or functional environment in which the high-risk AI system is intended to be used.

Managing risk: avoiding bias 

In the area of data governance, a key role is attributed to the avoidance of bias where it may lead to risks to the health and safety of individuals, adversely affect fundamental rights or give rise to discrimination prohibited by Union law, in particular where data outputs influence incoming information for future operations. To this end, appropriate measures should be taken to detect, prevent and mitigate possible biases identified.

The AIA exceptionally enables the processing of special categories of personal data provided that they offer adequate safeguards in relation to the fundamental rights and freedoms of natural persons. But it imposes additional conditions:

  • the processing of other data, such as synthetic or anonymised data, does not allow effective detection and correction of biases;
  • that special categories of personal data are subject to technical limitations concerning the re-use of personal data and to state-of-the-art security and privacy protection measures, including the pseudonymisation;
  • that special categories of personal data are subject to measures to ensure that the personal data processed are secured, protected and subject to appropriate safeguards, including strict controls and documentation of access, to prevent misuse and to ensure that only authorised persons have access to such personal data with appropriate confidentiality obligations;
  • that special categories of personal data are not transmitted or transferred to third parties and are not otherwise accessible to them;
  • that special categories of personal data are deleted once the bias has been corrected or the personal data have reached the end of their retention period, whichever is the earlier;
  • that the records of processing activities under Regulations (EU) 2016/679 and (EU) 2018/1725 and Directive (EU) 2016/680 include the reasons why the processing of special categories of personal data was strictly necessary for detecting and correcting bias, and why that purpose could not be achieved by processing other data.

The regulatory provisions are extremely interesting. RGPD, DGA or EHDS are in favour of processing anonymised data. AIA makes an exception in cases where inadequate or low-quality datasets are generated from a bias point of view.

Individual developers, data spaces and intermediary services providing datasets and/or platforms for development must be particularly diligent in defining their security. This provision is consistent with the requirement to have secure processing spaces in EHDS, implies a commitment to certifiable security standards, whether public or private, and advises a re-reading of the seventeenth additional provision on data processing in our Organic Law on Data Protection in the area of pseudonymisation, insofar as it adds ethical and legal guarantees to the strictly technical ones.  Furthermore, the need to ensure adequate traceability of uses is underlined. In addition, it will be necessary to include in the register of processing activities a specific mention of this type of use and its justification.

Apply lessons learned from data protection, by design and by default

Article 10 of AIA requires the documentation of relevant design decisions and the identification of relevant data gaps or deficiencies that prevent compliance with AIA and how to address them. In short, it is not enough to ensure data governance, it is also necessary to provide documentary evidence and to maintain a proactive and vigilant attitude throughout the lifecycle of information systems.

These two obligations form the keystone of the system. And its reading should even be much broader in the legal dimension. Lessons learned from the GDPR teach that there is a dual condition for proactive accountability and the guarantee of fundamental rights. The first is intrinsic and material: the deployment of privacy engineering in the service of data protection by design and by default ensures compliance with the GDPR. The second is contextual: the processing of personal data does not take place in a vacuum, but in a broad and complex context regulated by other sectors of the law.

Data governance operates structurally from the foundation to the vault of AI-based information systems. Ensuring that it exists, is adequate and functional is essential.  This is the understanding of the Spanish Government's Artificial Intelligence Strategy 2024  which seeks to provide the country with the levers to boost our development.

AIA makes a qualitative leap and underlines the functional approach from which data protection principles should be read by stressing the population dimension. This makes it necessary to rethink the conditions under which the GDPR has been complied with in the European Union. There is an urgent need to move away from template-based models that the consultancy company copies and pastes. It is clear that checklists and standardisation are indispensable. However, its effectiveness is highly dependent on fine tuning. And this calls particularly on the professionals who support the fulfilment of this objective to dedicate their best efforts to give deep meaning to the fulfilment of the Artificial Intelligence Regulation.  

You can see a summary of the regulations in the following infographic:

Captura de la infografía

You can access the accessible and interactive version here

Content prepared by Ricard Martínez, Director of the Chair of Privacy and Digital Transformation. Professor, Department of Constitutional Law, Universitat de València. The contents and points of view reflected in this publication are the sole responsibility of its author.

calendar icon
Blog

The European Union has devised a fundamental strategy to ensure accessible and reusable data for research, innovation and entrepreneurship. Strategic decisions have been made both in a regulatory and in a material sense to build spaces for data sharing and to foster the emergence of intermediaries with the capacity to process information.

European policies give rise to a very diverse ecosystem that should be differentiated. On the one hand, there is a deepening of open data reuse policies. On the other hand, the aim is to cover a space that has been inaccessible until now. We are referring to data that, due to the guarantee of the fundamental right to data protection, intellectual property or business secrecy, was inaccessible. Today, anonymization technologies, as well as data intermediation technologies, make it possible to process them with due guarantees. Finally, the aim is to provide resources through the promotion of data spaces, initiatives that propose federative models, such as Gaia X, or the European digital infrastructures (EDIC) promoted by the European Commission and the Digital Innovation Hubs aimed at promoting business and government in this field.  This scenario will boost different types of use in research, invocation and entrepreneurship.

This article focuses on the agreement signed by the National Statistics Institute (INE), the State Tax Administration Agency (AEAT), different Social Security bodies, the State Public Employment Service (SEPE) and the Bank of Spain to boost access to data, which is part of this EU strategy whose principles, rules and conditions must be explained in order to place it in context, underline its importance and understand the implications of the agreement.    

Competing by guaranteeing our rights

The EU competes at a structural disadvantage vis-à-vis the US or the People's Republic of China. On the North American side, the development processes of disruptive technologies in the context of the Internet and, particularly, the deployment of search engines, social networks and mobile applications have favoured the birth of a data broking market in which a few companies have an almost monopolistic power over data. The great champions of the digital world manage information on practically all sectors of activity, thanks to a business model based on the capitalisation or commoditisation of our privacy and their entry into sectors such as health or activity bracelets. Every time a user did a search, sent an email, commented on a social network or dictated a message to a mobile phone, it fuelled that position of dominance and underpinned the development of large language models in artificial intelligence or the deployment of algorithmic tools linked to neuroemotional marketing.

On the Chinese side, there is a closed internet model under state control, with a position of participation and surveillance over the large local multinationals in the sector and a global dominance over 5G network traffic. It is a vigilant state that has become the first power in the deployment of artificial intelligence through video surveillance and facial recognition and has a very clear state policy on the deployment of artificial intelligence (AI), creating advantages to compete in this race.

The EU starts from an apparently disadvantageous position. It is not at all a question of lack of talent or high abilities. Much of the Internet and IT ecosystem has been developed in Europe or by European talent. However, our market has not been able to generate conditions that would allow the emergence of major technological champions capable of supporting the entire value chain, from cloud infrastructures to the availability of large volumes of data that feed this ecosystem. Moreover, the EU adopted an ethical, political and legal commitment to freedoms, equity and democracy. This position, which has operated as a kind of barrier in terms of costs and processes, integrates within it the essential requirements for a democratic, inclusive and liberty-guaranteeing digital transformation.

The Data Governance Act

The legal substratum of data sharing is integrated by a complex modular structure integrating the General Data Protection Regulation (GDPR), the Open Data and Re-use of Public Sector Information Directive, the Data Governance Act (DGA), the Data Act (DA) and, in the immediate future, the artificial Intelligence Act and the European Health Data Space Regulation (EHDS). The rules should facilitate the re-use of data, including those under the scope of data protection, intellectual property and business secrecy. Several factors must operate to make this possible, which are set out below:

  1. Data sharing from government should grow exponentially and generate a data market that is currently monopolised by foreign companies.
  2. Digital sovereignty in legal terms will also be a growth driver insofar as it defines market rules based on the philosophy of the European Union centred on the guarantee of fundamental rights. This should have an immediate consequence when defining processes aimed at producing safe and reliable products.
  3. Digital sovereignty will in turn have important technological consequences. Public data spaces, whether promoted from digital hubs or federations of nodes, such as Gaia X, should make data available to the individual researcher or start-up, including application dashboards and technical support.
  4. The result of the regulation is to accelerate and increase the possibilities for freeing and sharing data. The EU and the convention under discussion seek to release data subject to trade secrecy, intellectual property or, in particular, the protection of personal data, in a secure manner through intermediation processes in secure data environments. This matter has occupied, among others, the Spanish Data Protection Agency and the European Cybersecurity Agency (ENISA). This implies a commitment to anonymisation and/or quasi-anonymisation environments through technologies such as differential privacy, homomorphic encryption and homomorphic encryption or multi-party computing.

All of this is based on the guarantee of fundamental rights and the empowerment of people. GDPR, DGA, DA and EHDS should make it possible to achieve the dual objective of creating a European market for the free movement and re-use of protected data. This ensures that individuals and organisations can exercise their rights of control and, at the same time, share these rights, while also encouraging data altruism. Moreover, the GDPR, DGA, EHDS and the artificial Intelligence Act define precise limits through prohibitions on use, regulated access conditions and ethically and legally sound design procedures. With an idea that should be considered central, there is a dimension of public or common interest that, beyond the epic battles of COVID, reaches the small but essential aspirations of the individual researcher, the disruptive entrepreneur, the SME trying to improve its value chain or the Administration innovating processes at the service of people.

Spain commits to the digital transformation of data spaces

The 2025 Plan, the Artificial Intelligence Strategy, the efforts of the Next Generation funds through its Strategic Projects for Economic Recovery and Transformation (PERTES in Spanish), the AI Missions and the Digital Bill of Rights exemplify Spain's alignment and leadership in this field. To make these strategies viable, secure data and process environments are essential. Now, the National Health Data Space has been joined by the agreement between the INE, the AEAT, different Social Security bodies, the SEPE and the Bank of Spain. As its explanatory memorandum states, it constitutes a first and encouraging step towards the deployment of DGA in our country.

They understand not only the scientific and business value of the statistical information they handle, but also the significant growth in demand and need for it. On the other hand, they take on a qualitatively relevant issue: the interest derived from the interconnection of datasets from the point of view of the value they bring. They therefore declare their willingness to maximise the added value of their data by allowing cross-referencing or integration when research is carried out for scientific purposes in the public interest.

The keys to the agreement to provide statistical data to researchers for scientific purposes in the public interest

Some of the questions that may arise with regard to this agreement are answered below.

  • How can the data be accessed?

Access to data goes through a cross-information access request that must be individually accepted by each institution. This takes into account certain assessment criteria regarding the nature of the data and the interest of the proposal.

Facilitating this access implies for the signatory institutions an effort of de-identification and cross-checking carried out by each of them directly or through trusted third parties. The result, "depending on the security level of the resulting file", will entail:

  • direct and autonomous access.
  • processing of the data in one of the secure rooms or centres made available by the signatory entities.

Some of the rooms currently available are:

Also noteworthy is the creation of ES_DataLab, which facilitates access to microdata in an environment that guarantees the confidentiality of the information. It allows cross-referencing data from different participating institutions, such as the INE, the AEAT, the Secretary of State for Social Security and Pensions, the Social Security General Treasury (TGSS), the National Social Security Institute (INSS), the Social Marine Institute (ISM), the Social Security IT Management (GISS), the State Public Employment Service and the Bank of Spain.

In implementation of the DGA 's plans, the Single National Information Point(NSIP), managed by the General Directorate of Data, has been set up, from where citizens, business people or researchers can locate information on protected public sector data. This item is available at datos.gob.es.

  • What data is shared?

The volume and typologies of data they handle are truly significant. The press release presenting the agreement stated that it would be possible to access "the microdata bases owned by the INE, the AEAT, the SS and the BE, with the necessary guarantees of security, statistical secrecy, personal data protection and compliance with current legislation. In addition to statistical databases from its surveys, INE may also provide access to administrative registers, both those compiled or coordinated by INE and those under other ownership but which INE uses to compile its statistics (in the latter case consulting all requests for access to the holders of the corresponding registers)".

  • Who can access the data?

In order to grant access to the data, the confidentiality regime applicable to the data requested and its legal framework, the social interest of the results to be obtained in the research, the profile, trajectory and scientific publications of the principal investigator and associated researchers or the history of research projects of the entity backing the project, among other aspects, shall be taken into account.

One of the issues envisaged by the DGA in this area consists of establishing economic considerations that ensure the sustainability of the system. In any case, the third clause of the agreement provides for the possibility of receiving financial consideration from applicants for the services of preparing and making available the data contained in the databases owned by them, in accordance with the provisions of statistical legislation (Article 21.3 of the Law 12/1989 of 9 May 1989 on the Public Statistical Function - LFEP) and in the regulations governing each institution.

  • What challenges do data access requesters and signatories face?

Regardless of the scientific conditions of the research proposal, it is essential to appeal to the deploying institutions to significantly increase the quality of their data protection and information security compliance processes. But this will not be enough, the deployment of artificial intelligence requires the incorporation of additional processes that we can find in the document of the Conference of Rectors of Spanish Universities CRUE ICT 360º, addressed in 2023 for the assumption of the university. While it is true that the artificial Intelligence Act proposes a scenario of less regulation in basic research, it also requires a high level of ethical deployment. And to do so, it will be essential to apply principles of artificial intelligence ethics, with the model ALTAI (Assessment List for Trustworthy Artificial Intelligence) or an alternative model, and to the Fundamental Rights Impact Analysis (FRAIA). This is without neglecting the high legal requirements for the development of market-oriented systems. Beyond the formal declarations of the Convention, the lessons learned from European projects affirm the need for a procedural framework of evidence-based legal and ethical verification of research projects and the capacities of institutions requesting access to data.

From the point of view of the signatory institutions, in addition to the challenge of the economic sustainability of the model, foreseen and regulated in the agreement, the need for a regulatory investment strategy seems evident. We have no doubt that each data repository and the processes underpinning them have been subject to a data protection impact assessment and security methodologies linked to the National Scheme. Data protection by design and by default or compliance with the recommendations on anonymisation and data space management mentioned above will be further elements considered. This translates into processes, but also into people - chief data officers, data analysts, other mediators such as data protection officers, etc. - together with a high level of security requirements. On the other hand, the duty of transparency vis-à-vis citizens will require efficient channels and a very precise risk management model in the event of a possible mass exercise of a right to object to processing, without prejudice to its feasibility.

Finally, the Spanish Data Protection Agency should approach this process in a proactive and promotional way without renouncing its role as guarantor of fundamental rights, but contributing to the development of functional solutions. This is not just any agreement but an essential test bed for the future of data research in Spain.

In our opinion, the most exciting statement of these institutions consists of understanding the agreement "as the embryo of the future System of access to data for research for scientific purposes of public interest, which must be in accordance with the Spanish and European strategy on data and the legislation on its governance, within a framework of development of public sector data spaces, and respecting in any case the autonomy and the legal regime applicable to the Banco de España".


Content prepared by Ricard Martínez, Director of the Chair of Privacy and Digital Transformation. Professor, Department of Constitutional Law, Universitat de València. The contents and points of view reflected in this publication are the sole responsibility of its author.

calendar icon