Legislación y justicia

More transparency in AI: new template for documenting general-purpose model training data

Blog

Artificial Intelligence (AI) is transforming society, the economy and public services at an unprecedented speed. This revolution brings enormous opportunities, but also challenges related to ethics, security and the protection of fundamental rights. Aware of this, the European Union approved the Artificial Intelligence Act (AI Act), in force since August 1, 2024, which establishes a harmonized and pioneering framework for the development, commercialization and use of AI systems in the single market, fostering innovation while protecting citizens.

A particularly relevant area of this regulation is general-purpose AI models (GPAI), such as large language models (LLMs) or multimodal models, which are trained on huge volumes of data from a wide variety of sources (text, images and video, audio and even user-generated data). This reality poses critical challenges in intellectual property, data protection and transparency on the origin and processing of information.

To address them, the European Commission, through the European AI Office, has published the Template for the Public Summary of Training Content for general-purpose AI models: a standardized format that providers will be required to complete and publish to summarize key information about the data used in training. From 2 August 2025, any general-purpose model placed on the market or distributed in the EU must be accompanied by this summary; models already on the market have until 2 August 2027 to adapt. This measure materializes the AI Act's principle of transparency and aims to shed light on the "black boxes" of AI.

In this article, we explain this template keys´s: from its objectives and structure, to information on deadlines, penalties, and next steps.

Objectives and relevance of the template

General-purpose AI models are trained on data from a wide variety of sources and modalities, such as:

Text: books, scientific articles, press, social networks.
Images and videos: digital content from the Internet and visual collections.
Audio: recordings, podcasts, radio programs, or conversations.
User data: information generated in interaction with the model itself or with other services of the provider.

This process of mass data collection is often opaque, raising concerns among rights holders, users, regulators, and society as a whole. Without transparency, it is difficult to assess whether data has been obtained lawfully, whether it includes unauthorised personal information or whether it adequately represents the cultural and linguistic diversity of the European Union.

Recital 107 of the AI Act states that the main objective of this template is to increase transparency and facilitate the exercise and protection of rights. Among the benefits it provides, the following stand out:

Intellectual property protection: allows authors, publishers and other rights holders to identify if their works have been used during training, facilitating the defense of their rights and a fair use of their content.
Privacy safeguard: helps detect whether personal data has been used, providing useful information so that affected individuals can exercise their rights under the General Data Protection Regulation (GDPR) and other regulations in the same field.
Prevention of bias and discrimination: provides information on the linguistic and cultural diversity of the sources used, key to assessing and mitigating biases that may lead to discrimination.
Fostering competition and research: reduces "black box" effects and facilitates academic scrutiny, while helping other companies better understand where data comes from, favoring more open and competitive markets.

In short, this template is not only a legal requirement, but a tool to build trust in artificial intelligence, creating an ecosystem in which technological innovation and the protection of rights are mutually reinforcing.

Template structure

The template, officially published on 24 July 2025 after a public consultation with more than 430 participating organisations, has been designed so that the information is presented in a clear, homogeneous and understandable way, both for specialists and for the public.

It consists of three main sections, ranging from basic model identification to legal aspects related to data processing.

1. General information

It provides a global view of the provider, the model, and the general characteristics of the training data:

Identification of the supplier, such as name and contact details.
Identification of the model and its versions, including dependencies if it is a modification (fine-tuning) of another model.
Date of placing the model on the market in the EU.
Data modalities used (text, image, audio, video, or others).
Approximate size of data by modality, expressed in wide ranges (e.g., less than 1 billion tokens, between 1 billion and 10 trillion, more than 10 trillion).
Language coverage, with special attention to the official languages of the European Union.

This section provides a level of detail sufficient to understand the extent and nature of the training, without revealing trade secrets.

2. List of data sources

It is the core of the template, where the origin of the training data is detailed. It is organized into six main categories, plus a residual category (other).

Public datasets:
- Data that is freely available and downloadable as a whole or in blocks (e.g., open data portals, common crawl, scholarly repositories).
- "Large" sets must be identified, defined as those that represent more than 3% of the total public data used in a specific modality.
Licensed private sets:
- Data obtained through commercial agreements with rights holders or their representatives, such as licenses with publishers for the use of digital books.
- A general description is provided only.
Other unlicensed private data:
- Databases acquired from third parties that do not directly manage copyright.
- If they are publicly known, they must be listed; otherwise, a general description (data type, nature, languages) is sufficient.
Data obtained through web crawling/scraping:
- Information collected by or on behalf of the supplier using automated tools.
- It must be specified:
  - Name/identifier of the trackers.
  - Purpose and behavior (respect for robots.txt, captchas, paywalls, etc.).
  - Collection period.
  - Types of websites (media, social networks, blogs, public portals, etc.).
  - List of most relevant domains, covering at least the top 10% by volume. For SMBs, this requirement is adjusted to 5% or a maximum of 1,000 domains, whichever is less.
Users data:
- Information generated through interaction with the model or with other provider services.
- It must indicate which services contribute and the modality of the data (text, image, audio, etc.).
Synthetic data:
- Data created by or for the supplier using other AI models (e.g., model distillation or reinforcement with human feedback - RLHF).
- Where appropriate, the generator model should be identified if it is available in the market.

Additional category – Other: Includes data that does not fit into the above categories, such as offline sources, self-digitization, manual tagging, or human generation.

3. Aspects of data processing

It focuses on how data has been handled before and during training, with a particular focus on legal compliance:

Respect for Text and Data Mining (TDM): measures taken to honour the right of exclusion provided for in Article 4(3) of Directive 2019/790 on copyright, which allows rightholders to prevent the mining of texts and data. This right is exercised through opt-out protocols, such as tags in files or configurations in robots.txt, that indicate that certain content cannot be used to train models. Vendors should explain how they have identified and respected these opt-outs in their own datasets and in those purchased from third parties.
Removal of illegal content: procedures used to prevent or debug content that is illegal under EU law, such as child sexual abuse material, terrorist content or serious intellectual property infringements. These mechanisms may include blacklisting, automatic classifiers, or human review, but without revealing trade secrets.

The following diagram summarizes these three sections:

Balancing transparency and trade secrets

The European Commission has designed the template seeking a delicate balance: offering sufficient information to protect rights and promote transparency, without forcing the disclosure of information that could compromise the competitiveness of suppliers.

Public sources: the highest level of detail is required, including names and links to "large" datasets.
Private sources: a more limited level of detail is allowed, through general descriptions when the information is not public.
Web scraping: a summary list of domains is required, without the need to detail exact combinations.
User and synthetic data: the information is limited to confirming its use and describing the modality.

Thanks to this approach, the summary is "generally complete" in scope, but not "technically detailed", protecting both transparency and the intellectual and commercial property of companies.

Compliance, deadlines and penalties

Article 53 of the AI Act details the obligations of general-purpose model providers, most notably the publication of this summary of training data.

This obligation is complemented by other measures, such as:

Have a public copyright policy.
Implement risk assessment and mitigation processes, especially for models that may generate systemic risks.
Establish mechanisms for traceability and supervision of data and training processes.

Non-compliance can lead to significant fines, up to €15 million or 3% of the company's annual global turnover, whichever is higher.

Next Steps for Suppliers

To adapt to this new obligation, providers should:

Review internal data collection and management processes to ensure that necessary information is available and verifiable.
Establish clear transparency and copyright policies, including protocols to respect the right of exclusion in text and data mining (TDM).
Publish the abstract on official channels before the corresponding deadline.
Update the summary periodically, at least every six months or when there are material changes in training.

The European Commission, through the European AI Office, will monitor compliance and may request corrections or impose sanctions.

A key tool for governing data

In our previous article, "Governing Data to Govern Artificial Intelligence", we highlighted that reliable AI is only possible if there is a solid governance of data.

This new template reinforces that principle, offering a standardized mechanism for describing the lifecycle of data, from source to processing, and encouraging interoperability and responsible reuse.

This is a decisive step towards a more transparent, fair and aligned AI with European values, where the protection of rights and technological innovation can advance together.

Conclusions

The publication of the Public Summary Template marks a historic milestone in the regulation of AI in Europe. By requiring providers to document and make public the data used in training, the European Union is taking a decisive step towards a more transparent and trustworthy artificial intelligence, based on responsibility and respect for fundamental rights. In a world where data is the engine of innovation, this tool becomes the key to governing data before governing AI, ensuring that technological development is built on trust and ethics.

Content created by Dr. Fernando Gualo, Professor at UCLM and Government and Data Quality Consultant. The content and views expressed in this publication are the sole responsibility of the author.

13/10/2025

Artificial intelligence, data and responsibilities

Blog

When dealing with the liability arising from the use of autonomous systems based on the use of artificial intelligence , it is common to refer to the ethical dilemmas that a traffic accident can pose. This example is useful to illustrate the problem of liability for damages caused by an accident or even to determine other types of liability in the field of road safety (for example, fines for violations of traffic rules).

Let's imagine that the autonomous vehicle has been driving at a higher speed than the permitted speed or that it has simply skipped a signal and caused an accident involving other vehicles. From the point of view of the legal risks, the liability that would be generated and, specifically, the impact of data in this scenario, we could ask some questions that help us understand the practical scope of this problem:

Have all the necessary datasets of sufficient quality to deal with traffic risks in different environments (rural, urban, dense cities, etc.) been considered in the design and training?
What is the responsibility if the accident is due to poor integration of the artificial intelligence tool with the vehicle or a failure of the manufacturer that prevents the correct reading of the signs?
Who is responsible if the problem stems from incorrect or outdated information on traffic signs?

In this post we are going to explain what aspects must be considered when assessing the liability that can be generated in this type of case.

The impact of data from the perspective of the subjects involved

In the design, training, deployment and use of artificial intelligence systems, the effective control of the data used plays an essential role in the management of legal risks. The conditions of its processing can have important consequences from the perspective of liability in the event of damage or non-compliance with the applicable regulations.

A rigorous approach to this problem requires distinguishing according to each of the subjects involved in the process, from its initial development to its effective use in specific circumstances, since the conditions and consequences can be very different. In this sense, it is necessary to identify the origin of the damage or non-compliance in order to impute the legal consequences to the person who should effectively be considered responsible:

Thus, damage or non-compliance may be determined by a design problem in the application used or in its training, so that certain data is misused for this purpose. Continuing with the example of an autonomous vehicle, this would be the case of accessing the data of the people traveling in it without consent.
However, it is also possible that the problem originates from the person who deploys the tool in each environment for real use, a position that would be occupied by the vehicle manufacturer. This could happen if, for its operation, data is accessed without the appropriate permissions or if there are restrictions that prevent access to the information necessary to guarantee its proper functioning.
The problem could also be generated by the person or entity using the tool itself. Returning to the example of the vehicle, it could be stated that the ownership of the vehicle corresponds to a company or an individual that has not carried out the necessary periodic inspections or updated the system when necessary.
Finally, there is the possibility that the legal problem of liability is determined by the conditions under which the data are provided at their original source. For example, if the data is inaccurate: the information about the road on which the vehicle is traveling is not up to date or the data emitted by traffic signs is not sufficiently accurate.

Challenges related to the technological environment: complexity and opacity

In addition, the very uniqueness of the technology used may significantly condition the attribution of liability. Specifically, technological opacity – that is, the difficulty in understanding why a system makes a specific decision – is one of the main challenges when it comes to addressing the legal challenges posed by artificial intelligence, as it makes it difficult to determine the responsible subject. This is a problem that acquires special importance with regard to the lawful origin of the data and, likewise, the conditions under which its processing takes place. In fact, this was precisely the main stumbling block that generative artificial intelligence encountered in the initial moments of its landing in Europe: the lack of adequate conditions of transparency regarding the processing of personal data justified the temporary halt of its commercialization until the necessary adjustments were made.

In this sense, the publication of the data used for the training phase becomes an additional guarantee from the perspective of legal certainty and, specifically, to verify the regulatory compliance conditions of the tool.

On the other hand, the complexity inherent in this technology poses an additional difficulty in terms of the imputation of the damage that may be caused and, consequently, in the determination of who should pay for it. Continuing with the example of the autonomous vehicle, it could be the case that various causes overlap, such as the inaccuracy of the data provided by traffic signs and, at the same time, a malfunction of the computer application by not detecting potential inconsistencies between the data used and its actual needs.

What does the regulation of the European Regulation on artificial intelligence say about it?

Regulation (EU) 2024/1689 establishes a harmonised regulatory framework across the European Union in relation to artificial intelligence. With regard to data, it includes some specific obligations for systems classified as "high risk", which are those contemplated in Article 6 and in the list in Annex III (biometric identification, education, labour management, access to essential services, etc.). In this sense, it incorporates a strict regime of technical requirements, transparency, supervision and auditing, combined with conformity assessment procedures prior to its commercialization and post-market control mechanisms, also establishing precise responsibilities for suppliers, operators and other actors in the value chain.

As regards data governance, a risk management system should be put in place covering the entire lifecycle of the tool and assessing, mitigating, monitoring and documenting risks to health, safety and fundamental rights. Specifically, training, validation, and testing datasets are required to be:

Relevant, representative, complete and as error-free as possible for the intended purpose.
Managed in accordance with strict governance practices that mitigate bias and discrimination, especially when they may affect the fundamental rights of vulnerable or minority groups.
The Regulation also lays down strict conditions for the exceptional use of special categories of personal data with regard to the detection and, where appropriate, correction of bias.

With regard to technical documentation and record keeping, the following are required:

The preparation and maintenance of exhaustive technical documentation. In particular, with regard to transparency, complete and clear instructions for use should be provided, including information on data and output results, among other things.
Systems should allow for the automatic recording of relevant events (logs) throughout their life cycle to ensure traceability and facilitate post-market surveillance, which can be very useful when checking the incidence of the data used.

As regards liability, that regulation is based on an approach that is admittedly limited from two points of view:

Firstly, it merely empowers Member States to establish a sanctioning regime that provides for the imposition of fines and other means of enforcement, such as warnings and non-pecuniary measures, which must be effective, proportionate and dissuasive of non-compliance with the regulation. They are, therefore, instruments of an administrative nature and punitive in nature, that is, punishment for non-compliance with the obligations established in said regulation, among which are those relating to data governance and the documentation and conservation of records referred to above.
However, secondly, the European regulator has not considered it appropriate to establish specific provisions regarding civil liability with the aim of compensating for the damage caused. This is an issue of great relevance that even led the European Commission to formulate a proposal for a specific Directive in 2022. Although its processing has not been completed, it has given rise to an interesting debate whose main arguments have been systematised in a comprehensive report by the European Parliament analysing the impact that this regulation could have.

No clear answers: open debate and regulatory developments

Thus, despite the progress made by the approval of the 2024 Regulation, the truth is that the regulation of liability arising from the use of artificial intelligence tools remains an open question on which there is no complete and developed regulatory framework. However, once the approach regarding the legal personification of robots that arose a few years ago has been overcome, it is unquestionable that artificial intelligence in itself cannot be considered a legally responsible subject.

As emphasized above, this is a complex debate in which it is not possible to offer simple and general answers, since it is essential to specify them in each specific case, taking into account the subjects that have intervened in each of the phases of design, implementation and use of the corresponding tool. It will therefore be these subjects who will have to assume the corresponding responsibility, either for the compensation of the damage caused or, where appropriate, to face the sanctions and other administrative measures in the event of non-compliance with the regulation.

In short, although the European regulation on artificial intelligence of 2024 may be useful to establish standards that help determine when a damage caused is contrary to law and, therefore, must be compensated, the truth is that it is an unclosed debate that will have to be redirected applying the general rules on consumer protection or defective products, taking into account the singularities of this technology. And, as far as administrative responsibility is concerned, it will be necessary to wait for the initiative that was announced a few months ago and that is pending formal approval by the Council of Ministers for its subsequent parliamentary processing in the Spanish Parliament.

Content prepared by Julián Valero, Professor at the University of Murcia and Coordinator of the Research Group "Innovation, Law and Technology" (iDerTec). The contents and points of view reflected in this publication are the sole responsibility of its author.

07/10/2025

How to prepare your data to work with artificial intelligence tools from a legal point of view

Blog

The idea of conceiving artificial intelligence (AI) as a service for immediate consumption or utility, under the premise that it is enough to "buy an application and start using it", is gaining more and more ground. However, getting on board with AI isn't like buying conventional software and getting it up and running instantly. Unlike other information technologies, AI will hardly be able to be used with the philosophy of plug and play. There is a set of essential tasks that users of these systems should undertake, not only for security and legal compliance reasons, but above all to obtain efficient and reliable results.

The Artificial Intelligence Regulation (RIA)[1]

The RIA defines frameworks that should be taken into account by providers[2] and those responsible for deploying[3] AI. This is a very complex rule whose orientation is twofold. Firstly, in an approach that we could define as high-level, the regulation establishes a set of red lines that can never be crossed. The European Union approaches AI from a human-centred and human-serving approach. Therefore, any development must first and foremost ensure that fundamental rights are not violated or that no harm is caused to the safety and integrity of people. In addition, no AI that could generate systemic risks to democracy and the rule of law will be admitted. For these objectives to materialize, the RIA deploys a set of processes through a product-oriented approach. This makes it possible to classify AI systems according to their level of risk, -low, medium, high- as well as general-purpose AI models[4]. And also, to establish, based on this categorization, the obligations that each participating subject must comply with to guarantee the objectives of the standard.

Given the extraordinary complexity of the European regulation, we would like to share in this article some common principles that can be deduced from reading it and could inspire good practices on the part of public and private organisations. Our approach is not so much on defining a roadmap for a given information system as on highlighting some elements that we believe can be useful in ensuring that the deployment and use of this technology are safe and efficient, regardless of the level of risk of each AI-based information system.

Define a clear purpose

The deployment of an AI system is highly dependent on the purpose pursued by the organization. It is not about jumping on the bandwagon of a fashion. It is true that the available public information seems to show that the integration of this type of technology is an important part of the digital transformation processes of companies and the Administration, providing greater efficiency and capabilities. However, it cannot become a fad to install any of the Large Language Models (LLMs). Prior reflection is needed that takes into account what the needs of the organization are and defines what type of AI will contribute to the improvement of our capabilities. Not adopting this strategy could put our bank at risk, not only from the point of view of its operation and results, but also from a legal perspective. For example, introducing an LLM or chatbot into a high-decision-making risk environment could result in reputational impacts or liability. Inserting this LLM in a medical environment, or using a chatbot in a sensitive context with an unprepared population or in critical care processes, could end up generating risk situations with unforeseeable consequences for people.

Do no evil

The principle of non-malefficiency is a key element and should decisively inspire our practice in the world of AI. For this reason, the RIA establishes a series of practices expressly prohibited to protect the fundamental rights and security of people. These prohibitions focus on preventing manipulations, discrimination, and misuse of AI systems that can cause significant harm.

Categories of Prohibited Practices

1. Manipulation and control of behavior. Through the use of subliminal or manipulative techniques that alter the behavior of individuals or groups, preventing informed decision-making and causing considerable damage.

2. Exploiting vulnerabilities. Derived from age, disability or social/economic situation to substantially modify behavior and cause harm.

3. Social Scoring. AI that evaluates people based on their social behavior or personal characteristics, generating ratings with effects for citizens that result in unjustified or disproportionate treatment.

4. Criminal risk assessment based on profiles. AI used to predict the likelihood of committing crimes solely through profiling or personal characteristics. Although its use for criminal investigation is admitted when the crime has actually been committed and there are facts to be analyzed.

5. Facial recognition and biometric databases. Systems for the expansion of facial recognition databases through the non-selective extraction of facial images from the Internet or closed circuit television.

6. Inference of emotions in sensitive environments. Designing or using AI to infer emotions at work or in schools, except for medical or safety reasons.

7. Sensitive biometric categorization. Develop or use AI that classifies individuals based on biometric data to infer race, political opinions, religion, sexual orientation, etc.

8. Remote biometric identification in public spaces. Use of "real-time" remote biometric identification systems in public spaces for police purposes, with very limited exceptions (search for victims, prevention of serious threats, location of suspects of serious crimes).

Apart from the expressly prohibited conduct, it is important to bear in mind that the principle of non-maleficence implies that we cannot use an AI system with the clear intention of causing harm, with the awareness that this could happen or, in any case, when the purpose we pursue is contrary to law.

Ensure proper data governance

The concept of data governance is found in Article 10 of the RIA and applies to high-risk systems. However, it contains a set of principles that are highly cost-effective when deploying a system at any level. High-risk AI systems that use data must be developed with training, validation, and testing suites that meet quality criteria. To this end, certain governance practices are defined to ensure:

Proper design.
That the collection and origin of the data, and in the case of personal data the purpose pursued, are adequate and legitimate.
Preparation processes such as annotation, labeling, debugging, updating, enrichment, and aggregation are adopted.
That the system is designed with use cases whose information is consistent with what the data is supposed to measure and represent.
Ensure data quality by ensuring the availability, quantity, and adequacy of the necessary datasets.
Detect and review biases that may affect the health and safety of people, rights or generate discrimination, especially when data outputs influence the input information of future operations. Measures should be taken to prevent and correct these biases.
Identify and resolve gaps or deficiencies in data that impede RIA compliance, and we would add legislation.
The datasets used should be relevant, representative, complete and with statistical properties appropriate for their intended use and should consider the geographical, contextual or functional characteristics necessary for the system, as well as ensure its diversity. In addition, they shall be error-free and complete in view of their intended purpose.

AI is a technology that is highly dependent on the data that powers it. From this point of view, not having data governance can not only affect the operation of these tools, but could also generate liability for the user.

In the not too distant future, the obligation for high-risk systems to obtain a CE marking issued by a notified body (i.e., designated by a member state of the European Union) will provide conditions of reliability to the market. However, for the rest of the lower-risk systems, the obligation of transparency applies. This does not at all imply that the design of this AI should not take these principles into account as far as possible. Therefore, before making a contract, it would be reasonable to verify the available pre-contractual information both in relation to the characteristics of the system and its reliability and with respect to the conditions and recommendations for deployment and use.

Another issue concerns our own organization. If we do not have the appropriate regulatory, organizational, technical and quality compliance measures that ensure the reliability of our own data, we will hardly be able to use AI tools that feed on it. In the context of the RIA, the user of a system may also incur liability. It is perfectly possible that a product of this nature has been properly developed by the supplier and that in terms of reproducibility the supplier can guarantee that under the right conditions the system works properly. What developers and vendors cannot solve are the inconsistencies in the datasets that the user-client integrates into the platform. It is not your responsibility if the customer failed to properly deploy a General Data Protection Regulation compliance framework or is using the system for an unlawful purpose. Nor will it be their responsibility for the client to maintain outdated or unreliable data sets that, when introduced into the tool, generate risks or contribute to inappropriate or discriminatory decision-making.

Consequently, the recommendation is clear: before implementing an AI-based system, we must ensure that data governance and compliance with current legislation are adequately guaranteed.

Ensuring Safety

AI is a particularly sensitive technology that presents specific security risks, such as the corruption of data sets. There is no need to look for fancy examples. Like any information system, AI requires organizations to deploy and use them securely. Consequently, the deployment of AI in any environment requires the prior development of a risk analysis that allows identifying which are the organizational and technical measures that guarantee a safe use of the tool.

Train your staff

Unlike the GDPR, in which this issue is implicit, the RIA expressly establishes the duty to train as an obligation. Article 4 of the RIA is so precise that it is worthwhile to reproduce it in its entirety:

Providers and those responsible for deploying AI systems shall take measures to ensure that, to the greatest extent possible, their staff and others responsible on their behalf for the operation and use of AI systems have a sufficient level of AI literacy, taking into account their technical knowledge; their experience, education and training, as well as the intended context of use of AI systems and the individuals or groups of people in whom those systems are to be used.

This is certainly a critical factor. People who use artificial intelligence must have been given adequate training that allows them to understand the nature of the system and be able to make informed decisions. One of the core principles of European legislation and approach is that of human supervision. Therefore, regardless of the guarantees offered by a given market product, the organization that uses it will always be responsible for the consequences. And this will happen both in the case where the last decision is attributed to a person, and when in highly automated processes those responsible for its management are not able to identify an incident by making appropriate decisions with human supervision.

Guilt in vigilando

The massive introduction of LLMs poses the risk of incurring the so-called culpa in vigilando: a legal principle that refers to the responsibility assumed by a person for not having exercised due vigilance over another, when that lack of control results in damage or harm. If your organization has introduced any of these marketplace products that integrate functions such as reporting, evaluating alphanumeric information, and even assisting you in email management, it will be critical that you ensure compliance with the recommendations outlined above. It is particularly advisable to define very precisely the purposes for which the tool is implemented, the roles and responsibilities of each user, and to document their decisions and to train staff appropriately.

Unfortunately, the model of introduction of LLMs into the market has itself generated a systemic and serious risk for organizations. Most tools have opted for a marketing strategy that is no different from the one used by social networks in their day. That is, they allow open and free access to anyone. It is obvious that with this they achieve two results: reuse the information provided to them by monetizing the product and generate a culture of use that facilitates the adoption and commercialization of the tool.

Let's imagine a hypothesis, of course, that is far-fetched. A resident intern (MIR) has discovered that several of these tools have been developed and, in fact, are used in another country for differential diagnosis. Our MIR is very worried about having to wake up the head of medical duty in the hospital every 15 minutes. So, diligently, he hires a tool, which has not been planned for that use in Spain, and makes decisions based on the proposal of differential diagnosis of an LLM without yet having the capabilities that enable it for human supervision. Obviously, there is a significant risk of ending up causing harm to a patient.

Situations such as the one described force us to consider how organizations that do not use AI but are aware of the risk that their employees use them without their knowledge or consent should act. In this regard, a preventive strategy should be adopted based on the issuance of very precise circulars and instructions regarding the prohibition of their use. On the other hand, there is a hybrid risk situation. The LLM has been contracted by the organization and is used by the employee for purposes other than those intended. In this case, the safety-training duo acquires a strategic value.

Training and the acquisition of culture about artificial intelligence are probably an essential requirement for society as a whole. Otherwise, the systemic problems and risks that in the past affected the deployment of the Internet will happen again and who knows if with an intensity that is difficult to govern.

Content prepared by Ricard Martínez, Director of the Chair of Privacy and Digital Transformation. Professor, Department of Constitutional Law, Universitat de València. The contents and points of view reflected in this publication are the sole responsibility of its author.

NOTES:

[1] Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised standards in the field of artificial intelligence and amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 available in https://eur-lex.europa.eu/legal-content/ES/TXT/?uri=OJ%3AL_202401689

[2] The RIA defines 'provider' as a natural or legal person, public authority, body or agency that develops an AI system or a general-purpose AI model or for which an AI system or a general-purpose AI model is developed and places it on the market or puts the AI system into service under its own name or brand; for a fee or free of charge.

[3] The RIA defines "deployment controller" as a natural or legal person, or public authority, body, office or agency that uses an AI system under its own authority, except where its use is part of a personal activity of a non-professional nature.

[4] The RIA defines a 'general-purpose AI model' as an AI model, also one trained on a large volume of data using large-scale self-monitoring, which has a considerable degree of generality and is capable of competently performing a wide variety of different tasks, regardless of how the model is introduced to the market. and that it can be integrated into various downstream systems or applications, except for AI models that are used for research, development, or prototyping activities prior to their introduction to the market.

22/09/2025

The role of data in driving autonomous vehicles

Blog

Just a few days ago, the Directorate General of Traffic published the new Framework Programme for the Testing of Automated Vehicles which, among other measures, contemplates "the mandatory delivery of reports, both periodic and final and in the event of incidents, which will allow the DGT to assess the safety of the tests and publish basic information [...] guaranteeing transparency and public trust."

The advancement of digital technology is making it easier for the transport sector to face an unprecedented revolution in autonomous vehicle driving, offering significant improvements in road safety, energy efficiency and mobility accessibility.

The final deployment of these vehicles depends to a large extent on the availability, quality and accessibility of large volumes of data, as well as on an appropriate legal framework that ensures the protection of the various legal assets involved (personal data, trade secrets, confidentiality, etc.), traffic security and transparency. In this context, open data and the reuse of public sector information are essential elements for the responsible development of autonomous mobility, in particular when it comes to ensuring adequate levels of traffic safety.

Data Dependency on Autonomous Vehicles

The technology that supports autonomous vehicles is based on the integration of a complex network of advanced sensors, artificial intelligence systems and real-time processing algorithms, which allows them to identify obstacles, interpret traffic signs, predict the behavior of other road users and, in a collaborative way, plan routes completely autonomously.

In the autonomous vehicle ecosystem, the availability of quality open data is strategic for:

Improve road safety, so that real-time traffic data can be used to anticipate dangers, avoid accidents and optimise safe routes based on massive data analysis.
Optimise operational efficiency, as access to up-to-date information on the state of roads, works, incidents and traffic conditions allows for more efficient planning of journeys.
To promote sectoral innovation, facilitating the creation of new digital tools that facilitate mobility.

Specifically, ensuring the safe and efficient operation of this mobility model requires continuous access to two key categories of data:

Variable or dynamic data, which offers constantly changing information such as the position, speed and behaviour of other vehicles, pedestrians, cyclists or weather conditions in real time.
Static data, which includes relatively permanent information such as the exact location of traffic signs, traffic lights, lanes, speed limits or the main characteristics of the road infrastructure.

The prominence of the data provided by public entities

The sources from which such data come are certainly diverse. This is of great relevance as regards the conditions under which such data will be available. Specifically, some of the data are provided by public entities, while in other cases the origin comes from private companies (vehicle manufacturers, telecommunications service providers, developers of digital tools...) with their own interests or even from people who use public spaces, devices and digital applications.

This diversity requires a different approach to facilitating the availability of data under appropriate conditions, in particular because of the difficulties that may arise from a legal point of view. In relation to Public Administrations, Directive (EU) 2019/1024 on open data and the reuse of public sector information establishes clear obligations that would apply, for example, to the Directorate General of Traffic, the Administrations owning public roads or municipalities in the case of urban environments. Likewise, Regulation (EU) 2022/868 on European data governance reinforces this regulatory framework, in particular with regard to the guarantee of the rights of third parties and, in particular, the protection of personal data.

Moreover, some datasets should be provided under the conditions established for dynamic data, i.e. those "subject to frequent or real-time updates, due in particular to their volatility or rapid obsolescence", which should be available "for re-use immediately after collection, through appropriate APIs and, where appropriate, in the form of a mass discharge."

One might even think that the high-value data category is of particular interest in the context of autonomous vehicles given its potential to facilitate mobility, particularly considering its potential to:

To promote technological innovation, as they would make it easier for manufacturers, developers and operators to access reliable and up-to-date information, essential for the development, validation and continuous improvement of autonomous driving systems.
Facilitate monitoring and evaluation from a security perspective, as transparency and accessibility of such data are essential prerequisites from this perspective.
To boost the development of advanced services, since data on road infrastructure, signage, traffic and even the results of tests carried out in the context of the aforementioned Framework Programme constitute the basis for new mobility applications and services that benefit society as a whole.

However, this condition is not expressly included for traffic-related data in the definition made at European level, so that, at least for the time being, public entities should not be required to disseminate the data that apply to autonomous vehicles under the unique conditions established for high-value data. However, at this time of transition for the deployment of autonomous vehicles, it is essential that public administrations publish and keep updated under appropriate conditions for their automated processing, some datasets, such as those relating to:

Road signs and vertical signage elements.
Traffic light states and traffic control systems.
Lane configuration and characteristics.
Information on works and temporary traffic alterations.
Road infrastructure elements critical for autonomous navigation.

The recent update of the official catalogue of traffic signs, which comes into force on 1 July 2025, incorporates signs adapted to new realities, such as personal mobility. However, it requires greater specificity with regard to the availability of data relating to signals under these conditions. This will require the intervention of the authorities responsible for road signage.

The availability of data in the context of the European Mobility Area

Based on these conditions and the need to have mobility data generated by private companies and individuals, data spaces appear as the optimal legal and governance environment to facilitate their accessibility under appropriate conditions.

In this regard, the initiatives for the deployment of the European Mobility Data Space, created in 2023, constitute an opportunity to integrate into its design and configuration measures that support the need for access to data required by autonomous vehicles. Thus, within the framework of this initiative, it would be possible to unlock the potential of mobility data , and in particular:

Facilitate the availability of data under conditions specific to the needs of autonomous vehicles.
Promote the interconnection of various data sources linked to existing means of transport, but also emerging ones.
Accelerate the digital transformation of autonomous vehicles.
Strengthen the digital sovereignty of the European automotive industry, reducing dependence on large foreign technology corporations.

In short, autonomous vehicles can represent a fundamental transformation in mobility as it has been conceived until now, but their development depends, among other factors, on the availability, quality and accessibility of sufficient and adequate data. The Sustainable Mobility Bill currently being processed in Parliament is a great opportunity to strengthen the role of data in facilitating innovation in this area, which would undoubtedly favour the development of autonomous vehicles. To this end, it will be essential, on the one hand, to have a data sharing environment that makes access to data compatible with the appropriate guarantees for fundamental rights and information security; and, on the other hand, to design a governance model that, as emphasised in the Programme promoted by the Directorate-General for Traffic, facilitates the collaborative participation of "manufacturers, developers, importers and fleet operators established in Spain or the European Union", which poses significant challenges in the availability of data.

01/07/2025

Training in artificial intelligence: a strategic necessity and a legal obligation

Blog

The Work Trends 2024 Index on the State of Artificial Intelligence in the Workplace and reports from T-Systems and InfoJobs indicate that 78% of workers in Spain use their own AI tools in the workplace. This figure rises to 80% in medium-sized companies. In addition, 1 in 3 workers (32%) use AI tools in their day-to-day work. 75% of knowledge workers use generative AI tools, and almost half have started doing so in the last six months. Interestingly, the generation gap is narrowing in this area. While 85% of Generation Z employees (18-28 years old) use personalised AI, it turns out that more than 70% of baby boomers (58+) also use these tools. In fact, this trend seems to be confirmed by different approaches.

Títle of the study	Source
2024 Work Trend Index: AI at work is here. Now comes the hard part	Microsoft, LinkedIn
2024 AI Adoption and Risk Report	Cyberhaven Labs
Generative AI''s fast and furious entry into Switzerland	Deloitte Switzerland
Bring Your Own AI: Balance Rewards and Risks (Webinar)	MITSloan
Lin, L. and Parker, K. (2025) U.S. workers are more worried than hopeful about future AI use in the Workplace	Pew Research Center

Figure 1. References on BYOAI

This phenomenon has been called BYOAI (Bring Your Own AI ), for short. It is characterised by the fact that the person employed usually uses some kind of open source solution such as ChatGPT. The organisation has not contracted the service, the registration has been made privately by the user and the provider obviously assumes no legal responsibility. If, for example, the possibilities offered by Notebook, Perplexity or DeepSeek are used, it is perfectly possible to upload confidential or protected documents.

On the other hand, this coincides, according to data from EuroStat, with the adoption of AI in the corporate sector. By 2024, 13.5% of European companies (with 10 or more employees) were using some form of AI technology, a figure that rises to 41% in large companies and is particularly high in sectors such as information and communication (48.7%), professional, scientific and technical services (30.5%). The trend towards AI adoption in the public sector is also growing due not only to global trends, but probably to the adoption of AI strategies and the positive impact of Next Generation funds.

The legal duty of AI literacy

In this context, questions immediately arise. The first concern the phenomenon of unauthorised use by employed persons: Has the data protection officer or the security officer issued a report to the management of the organisation? Has this type of use been authorised? Was the matter discussed at a meeting of the Security Committee? Has an information circular been issued defining precisely the applicable rules? But alongside these emerge others of a more general nature: What level of education do people have? Are they able to issue reports or make decisions using such tools?

The EU Regulation on Artificial Intelligence (RIA) has rightly established a duty of AI literacy imposed on the providers and deployers of such systems. They are responsible for taking measures to ensure that, to the greatest extent possible, their staff and others who are responsible for the operation and use of AI systems on their behalf have a sufficient level of AI literacy. This requires taking into account their expertise, experience, education and training. Training should be integrated into the intended context of use of the AI systems and be tailored to the profile of the individuals or groups in which the systems will be used.

Unlike in the General Data Protection Regulation, here the obligation is formulated in an express and imperative manner.. There is no direct reference to this matter in the GDPR, except in defining as a function of the data protection officer the training of staff involved in processing operations. This need can also be deduced from the obligation of the processor to ensure that persons authorised to process personal data are aware of their duty of confidentiality. It is obvious that the duty of proactive accountability, data protection by design and by default and risk management lead to the training of users of information systems. However, the fact is that the way in which this training is deployed is not always appropriate. In many organisations it is either non-existent, voluntary or based on the signing of a set of security obligations when taking up a job.

In the field of artificial intelligence-based information systems, the obligation to train is non-negotiable and imperative. The RIA provides for very high fines specified in the Bill for the good use and governance of Artificial Intelligence. When the future law is passed, it will be a serious breach of Article 26.2 of the RIA, concerning the need to entrust the human supervision of the system to persons with adequate competence, training and authority.

Benefits of AI training

Beyond legal coercion, training people is a wise and undoubtedly beneficial decision that should be read positively and conceived as an investment. On the one hand, it helps to adopt measures aimed at managing risk which in the case of the BYOAI includes data leakage, loss of intellectual property, compliance issues and cybersecurity. On the other hand, it is necessary to manage risks associated with regular use of AI. In this regard, it is essential that end-users have a very detailed understanding of the ways in which the technology works, its human oversight role in the decision-making process, and that they acquire the ability to identify and report any operational issues.

However, training must pursue high-level objectives. It should be continuous, combining theory, practice and updating permanent and include technical, ethical, legal and social impact aspects to promote a culture of knowledge and responsible use of AI in the organisation. Its benefits for the dynamics of public or private activity are wide-ranging.

With regard to its benefits, artificial intelligence (AI) literacy has become a strategic factor in transforming decision-making and promoting innovation in organisations:.

By equipping teams with a solid understanding of how AI works and its applications, it facilitates the interpretation of complex data and the use of advanced tools, enabling identification of patterns and anticipation of business-relevant trends .
This specialised knowledge contributes to minimising errors and biases, as it promotes decisions based on rigorous analysis rather than intuition, and enables the detection of possible deviations in automated systems. In addition, the automation of routine tasks reduces the likelihood of human failure and frees up resources that can be focused on strategic and creative activities.
The integration of AI into the organisational culture drives a mentality oriented towards critical analysis and the questioning of technological recommendations, thus promoting an evidence-based culture. This approach not only strengthens the ability to adapt to technological advances, but also facilitates the detection of opportunities to optimise processes, develop new products and improve operational efficiency.
In the legal and ethical sphere, AI literacy helps to manage compliance and reputational risksby fostering transparent and auditable practices that build trust with both society and regulators.
Finally, understanding the impact and possibilities of AI diminishes resistance to change and favours the adoption of new technologies, accelerating digital transformation and positioning the organisation as a leader in innovation and adaptation to the challenges of today''s environment.

Good practices for successful AI training

Organisations need to reflect on their training strategy in order to achieve these objectives. In this regard, it seems reasonable to share some lessons learned in the field of data protection. Firstly, it is necessary to point out that all training must start by engaging the organisation''s management team. Reverential fear of the Governing Board, the Local Corporation or the Government of the day should not exist. The political level of any organisation should lead by example if it really wants to permeate all human resources. And this training must be very specific not only from a risk management point of view but also from an opportunity approach based on a culture of responsible innovation.

Similarly, although it may involve additional costs, it is necessary to consider not only the users of AI-based information systems but all staff. This will not only allow us to avoid the risks associated with BYOAI but also to establish a corporate culture that facilitates AI implementation processes.

Finally, it will be essential to adapt training to specific profiles: both users of AI-based systems, technical (IT) staff and ethical and legal mediators and enablers, as well as compliance officers or those responsible for the procurement or tendering of products and services.

Without prejudice to the contents that this type of training should logically include, there are certain values that should inspire training plans. First of all, it is important to remember that this training is compulsory and functionally adapted to the job. Secondly, it must be able to empower people and engage them in the use of AI. The EU''s legal approach is based on the principle of human responsibility and oversight: the human always decides. It must therefore be able to make decisions appropriate to the output provided by the AI, to disagree with the machine''s judgement in an ecosystem that protects it and allows it to report incidents and review them.

Finally, there is one element that cannot be ignored under any circumstances: regardless of whether personal data are processed or not, and regardless of whether AI is intended for humans, its results will always have a direct or indirect impact on individuals or on society. Therefore, the training approach must integrate the ethical, legal and social implications of AI and engage users in guaranteeing fundamental rights and democracy.

Figure 2. Benefits of artificial intelligence literacy. Source: own elaboration

Good practices for successful AI training

Finally, it will be essential to adapt training to specific profiles: both users of AI-based systems, technical (IT) staff and ethical and legal mediators and enablers, as well as compliance officers or those responsible for the procurement or tendering of products and services.

Ricard Martínez Martínez, Director of the Microsoft-Universitat de Valencia Chair in Privacy and Digital Transformation

07/05/2025

Digital rights: principles, initiatives and challenges in the digital age

Blog

We live in an increasingly digitalised world where we work, study, inform ourselves and socialise through technologies. In this world, where technology and connectivity have become fundamental pillars of society, digital rights emerge as an essential component to guarantee freedom, privacy and equality in this new online facet of our lives.

Therefore, digital rights are nothing more than the extension of the fundamental rights and freedoms we already benefit from to the virtual environment. In this article we will explore what these rights are, why they are important and what are some of the benchmark initiatives in this area.

What are digital rights and why are they important?

As stated by Antonio Guterres, Secretary-General of the United Nations, during the Internet Governance Forum in 2018:

"Humanity must be at the centre of technological evolution. Technology should not use people; we should use technology for the benefit of all".

Technology should be used to improve our lives, not to dominate them. For this to be possible, as has been the case with other transformative technologies in the past, we need to establish policies that prevent as far as possible the emergence of unintended effects or malicious uses. Therefore, digital rights seek to facilitate a humanist digital transformation, where technological innovation is accompanied by protection for people, through a set of guarantees and freedoms that allow citizens to exercise their fundamental rights also in the digital environment. These include, for example:

Freedom of expression: for uncensored communication and exchange of ideas.
Right to privacy and data protection: guaranteeing privacy and control over personal information.
Access to information and transparency: ensuring that everyone has equal access to digital data and services.
Online security: seeks to protect users from fraud, cyber-attacks and other risks in the digital world.

In a digital environment, where information circulates rapidly and technologies are constantly evolving, guaranteeing these rights is crucial to maintaining the integrity of our interactions, the way we access and consume information, and our participation in public life.

An international framework for digital rights

As technology advances, the concept of digital rights has become increasingly important globally in recent decades. While there is no single global charter of digital rights, there are many global and regional initiatives that point in the same direction: the United Nations Universal Declaration of Human Rights. Originally, this declaration did not even mention the Internet, as it was proclaimed in 1948 and did not exist at that time, but today its principles are considered fully applicable to the digital world. Indeed, the international community agrees that the same rights that we proclaim for the offline world must also be respected online - "what is illegal offline must also be illegal online".

Furthermore, the United Nations has stressed that internet access is becoming a basic enabler of other rights, so connectivity should also be considered a new human right of the 21st century.

European and international benchmarking initiatives

In recent years, several initiatives have emerged with the aim of adapting and protecting fundamental rights also in the digital environment. For example, Europe has been a pioneer in establishing an explicit framework of digital principles. In January 2023, the European Union proclaimed the European Declaration on Digital Rights and Principles for the Digital Decade, a document that reflects the European vision of a people-centred technological transformation and sets out a common framework for safeguarding citizens' freedom, security and privacy in the digital age. This declaration, together with other international initiatives, underlines the need to harmonise traditional rights with the challenges and opportunities of the digital environment.

The Declaration, jointly agreed by the European Parliament, the Council and the Commission, defines a set of fundamental principles that should guide Europe's digital age (you can see a summary in this infographic):

Focused on people and their rights: Technology must serve people and respect their rights and dignity, not the other way around.
Solidarity and inclusion: promoting digital inclusion of all social groups, bridging the digital divide.
Freedom of choice: ensure fair and safe online environments, where users have real choice and where net neutrality is respected.
Participation in the digital public space: to encourage citizens to participate actively in democratic life at all levels, and to have control over their data.
Safety and security: increase trust in digital interactions through greater security, privacy and user control, especially protecting minors.
Sustainability: orienting the digital future towards sustainability, considering the environmental impact of technology.

The European Declaration on Digital Rights and Principles therefore sets out a clear roadmap for the European Union's digital laws and policies, guiding its digital transformation process. While this European Declaration does not itself create laws, it does establish a joint political commitment and a roadmap of values. Furthermore, it makes clear that Europe aims to promote these principles as a global standard.

In addition, the European Commission monitors implementation in all Member States and publishes an annual monitoring report, in conjunction with the State of the Digital Decade Report, to assess progress and stay on track. Furthermore, the Declaration serves as a reference in the EU's international relations, promoting a global digital transformation centred on people and human rights.

Outside Europe, several nations have also developed their own digital rights charters, such as the Ibero-American Charter of Principles and Rights in Digital Environments, and there are also international forums such as the Internet Governance Forum which regularly discusses how to protect human rights in cyberspace. The global trend is therefore to recognise that the digital age requires adapting and strengthening existing legal protections, not by creating "new" fundamental rights out of thin air, but by translating existing ones to the new environment.

Spain's Digital Bill of Rights

In line with all these international initiatives, Spain has also taken a decisive step by proposing its own Charter of Digital Rights. This ambitious project aims to define a set of specific principles and guarantees to ensure that all citizens enjoy adequate protection in the digital environment. Its goals include:

Define privacy and security standards that respond to the needs of citizens in the digital age.
Encourage transparency and accountability in both the public and private sectors.
To promote digital inclusion, ensuring equitable access to technologies and information.

In short, this national initiative represents an effort to adapt regulations and public policies to the challenges of the digital world, strengthening citizens' confidence in the use of new technologies. Moreover, since it was published as early as July 2021, it has also contributed to subsequent reflection processes at European level, including the European Declaration mentioned above.

The Spanish Digital Bill of Rights is structured in six broad categories covering the areas of greatest risk and uncertainty in the digital world:

Freedom rights: includes classic freedoms in their digital dimension, such as freedom of expression and information on the Internet, ideological freedom in networks, the right to secrecy of digital communications, as well as the right to pseudonymity.
Equality rights: aimed at avoiding any form of discrimination in the digital environment, including equal access to technology (digital inclusion of the elderly, people with disabilities or in rural areas), and preventing bias or unequal treatment in algorithmic systems.
Participation rights and shaping of public space: this refers to ensuring citizen and democratic participation through digital media. It includes electoral rights in online environments, protection from disinformation and the promotion of diverse and respectful online public debate.
Rights in the work and business environment: encompasses the digital rights of workers and entrepreneurs. A concrete example here is the right to digital disconnection of the worker. It also includes the protection of employee privacy from digital surveillance systems at work and guarantees in teleworking, among others.
Digital rights in specific environments: this addresses particular areas that pose their own challenges, for example the rights of children and adolescents in the digital environment (protection from harmful content, parental control, right to digital education); digital inheritance (what happens to our data and accounts on the Internet after our death); digital identity (being able to manage and protect our online identity); or rights in the emerging world of artificial intelligence, the metaverse and neurotechnologies.
Effectiveness and safeguards: this last category focuses on how to ensure that all these rightsare actually fulfilled. The Charter seeks to ensure that people have clear ways to complain in case of violations of their digital rights and that the authorities have the tools to enforce their rights on the internet.

As the government pointed out in its presentation, the aim is to "reinforce and extend citizens' rights, generate certainty in this new digital reality and increase people's confidence in the face of technological disruption". In other words, no new fundamental rights are created, but emerging areas (such as artificial intelligence or digital identity) are recognised where it is necessary to clarify how existing rights are applied and guaranteed.

The Digital Rights Observatory

The creation of a Digital Rights Observatory in Spain has recently been announced, a strategic tool aimed at continuously monitoring, promoting and evaluating the state and evolution of these rights in the country with the objective of contributing to making them effective. The Observatory is conceived as an open, inclusive and participatory space to bring digital rights closer to citizens, and its main functions include:

To push for the implementation of the Digital Bill of Rights, so that the ideas initially set out in 2021 do not remain theoretical, but are translated into concrete actions, laws and effective policies.
To monitor compliance with the regulations and recommendations set out in the Digital Bill of Rights.
Fighting inequality and discrimination online, helping to reduce digital divides so that technological transformation does not leave vulnerable groups behind.
Identify areas for improvement and propose measures for the protection of rights in the digital environment.
Detect whether the current legal framework is lagging behind in the face of new challenges from disruptive technologies such as advanced artificial intelligence that pose risks not covered by current laws.
Encourage transparency and dialogue between government, institutions and civil society to adapt policies to technological change.

Announced in February 2025, the Observatory is part of the Digital Rights Programme, a public-private initiative led by the Government, with the participation of four ministries, and financed by the European NextGenerationEU funds within the Recovery Plan. This programme involves the collaboration of experts in the field, public institutions, technology companies, universities and civil society organisations. In total more than 150 entities and 360 professionals have been involved in its development.

This Observatory is therefore emerging as an essential resource to ensure that the protection of digital rights is kept up to date and responds effectively to the emerging challenges of the digital age.

Conclusion

Digital rights are a fundamental pillar of 21st century societyand their consolidation is a complex task that requires the coordination of initiatives at international, European and national levels. Initiatives such as the European Digital Rights Declaration and other global efforts have laid the groundwork, but it is the implementation of specific measures such as the Spanish Digital Rights Charter and the new Digital Rights Observatory that will make the difference in ensuring a free, safe and equitable digital environment for all.

In short, the protection of digital rights is not only a legislative necessity, but an indispensable condition for the full exercise of citizenship in an increasingly interconnected world. Active participation and engagement of both citizens and institutions will be key to building a fair and sustainable digital future. If we can realise these rights, the Internet and new technologies will continue to be synonymous with opportunity and freedom, not threat. After all, digital rights are simply our old rights adapted to modern times, and protecting them is the same as protecting ourselves in this new digital age.

Content prepared by Carlos Iglesias, Open data Researcher and consultant, World Wide Web Foundation. The contents and views expressed in this publication are the sole responsibility of the author.

20/03/2025

The European Union's Guide to the Deployment of the Data Governance Act: public sector intermediary services

Blog

The Data Governance Act (DGA) is part of a complex web of EU public policy and regulation, the ultimate goal of which is to create a dataset ecosystem that feeds the digital transformation of the Member States and the objectives of the European Digital Decade:

A digitally empowered population and highly skilled digital professionals.
Secure and sustainable digital infrastructures.
Digital transformation of companies.
Digitisation of public services.

Public opinion is focusing on artificial intelligence from the point of view of both the opportunities and, above all, the risks and uncertainties. However, the challenge is much more profound as it involves in each of the different layers very diverse technologies, products and services whose common element lies in the need to favour the availability of a high volume of reliable and quality-checked data to support their development.

Promoting the use of data with legislation as leverage

At its inception the Directive 2019/1024 on open data and re-use of public sector information (Open Data Directive), the Directive 95/46/EC on the processing of personal data and on the free movement of such data, and subsequently the Regulation 2016/679 known as the General Data Protection Regulation(GDPR) opted for the re-use of data with full guarantee of rights. However, its interpretation and application generated in practice an effect contrary to its original objectives, clearly swinging towards a restrictive model that may have affected the processes of data generation for its exploitation. The large US platforms, through a strategy of free services - search engines, mobile applications and social networks - in exchange for personal data and with mere consent, obtained the largest volume of personal data in human history, including images, voice and personality profiles.

With the GDPR, the EU wanted to eliminate 28 different ways of applying prohibitions and limitations to the use of data. Regulatory quality certainly improved, although perhaps the results achieved have not been as satisfactory as expected and this is indicated by documents such as the Digital Economy and Society Index (DESI) 2022 or the Draghi Report (The future of European competitiveness-Part A. A competitiveness strategy for Europe).

This has forced a process of legislative re-engineering that expressly and homogeneously defines the rules that make the objectives possible. The reform of the Open Data Directive, the DGA, the Artificial Intelligence Regulation and the future European Health Data Space (EHDS) should be read from at least two perspectives:

The first of these is at a high level and its function is aimed at preserving our constitutional values. The regulation adopts an approach focused on risk and on guaranteeing the dignity and rights of individuals, seeking to avoid systemic risks to democracy and fundamental rights.
The second is operational, focusing on safe and responsible product development. This strategy is based on the definition of process engineering rules for the design of products and services that make European products a global benchmark for robustness, safety and reliability.

A Practical Guide to the Data Governance Law

Data protection by design and by default, the analysis of risks to fundamental rights, the development process of high-risk artificial intelligence information systems validated by the corresponding bodies or the processes of access and reuse of health data are examples of the legal and technological engineering processes that will govern our digital development. These are not easy procedures to implement. The European Union is therefore making a significant effort to fund projects such as TEHDAS, EUHubs4Data or Quantum , which operate as a testing ground. In parallel, studies are carried out or guides are published, such as the Practical Guide to the Data Governance Law.

This Guide recalls the essential objectives of the DGA:

Regulate the re-use of certain publicly owned data subject to the rights of third parties ("protected data", such as personal data or commercially confidential or proprietary data).
Boost data sharing by regulating data brokering service providers.
Encourage the exchange of data for altruistic purposes.
Establish the European Data Innovation Board to facilitate the exchange of best practices.

The DGA promotes the secure re-use of data through various measures and safeguards. These focus on the re-use of data from public sector bodies, data brokering services and data sharing for altruistic purposes.

To which data does it apply? Legitimation for the processing of protected data held by public sector bodies

In the public sector they are protected:

Confidential business data, such as trade secrets or know-how.
Statistically confidential data.
Data protected by the intellectual property rights of third parties.
Personal data, insofar as such data do not fall within the scope of the Open Data Directive when irreversible anonymisation is ensured and no special categories of data are concerned.

An essential starting point should be underlined: as far as personal data are concerned, the General Data Protection Regulation (GDPR) and the rules on privacy and electronic communications (Directive 2002/58/EC) also apply. This implies that, in the event of a collision between them and the DGA, the former will prevail.

Moreover, the DGA does not create a right of re-use or a new legal basis within the meaning of the GDPR for the re-use of personal data. This means that Member State or Union law determines whether a specific database or register containing protected data is open for re-use in general. Where such re-use is permitted, it must be carried out in accordance with the conditions laid down in Chapter I of the DGA.

Finally, they are excluded from the scope of the DGA:

Data held by public companies, museums, schools and universities.
Data protected for reasons of public security, defence or national security.
Data held by public sector bodies for purposes other than the performance of their defined public functions.
Exchange of data between researchers for non-commercial scientific research purposes.

Conditions for re-use of data

It can be noted that in the area of re-use of public sector data:

▪ The DGA establishes rules for the re-use of protected data, such as personal data, confidential commercial data or statistically sensitive data.

▪ It does not create a general right of re-use, but establishes conditions where national or EU law allows such re-use.

▪ The conditions for access must be transparent, proportionate and objective, and must not be used to restrict competition. The rule mandates the promotion of data access for SMEs and start-ups, and scientific research. Exclusivity agreements for re-use are prohibited, except in specific cases of public interest and for a limited period of time.

▪ Attributes to public sector bodies the duty to ensure the preservation of the protected nature of the data. This will require the deployment of intermediation methodologies and technologies. Anonymisation and access through secure processing environments (Secure processing environments or SPE) can play a key role. The former is a risk elimination factor, while PES can define a processing ecosystem that provides a comprehensive service offering to re-users, from the cataloguing and preparation of datasets to their analysis. The Spanish Data Protection Agency has published an Approach to data spaces from a GDPR perspective that includes recommendations and methodologies in this area.

▪ Re-users are subject to obligations of confidentiality and non-identification of data subjects. In case of re-identification of personal data, the re-user must inform the public sector body and there may be security breach notification obligations.

▪ Insofar as the relationship is established directly between the re-user and the public sector body, there may be cases in which the latter must provide support to the former for the fulfilment of certain duties:

To obtain, if necessary, the consent of the persons concerned for the processing of personal data.
In case of unauthorised use of non-personal data, the re-user shall inform the legal entities concerned. The public sector body that initially granted the permission for re-use may provide support if necessary.

▪ International transfers of personal data are governed by the GDPR. For international transfers of non-personal data, the re-user is required to inform the public sector body and to contractually commit to ensure data protection. However, this is an open question, since, as with the GDPR, the European Commission has the power to:

1. Propose standard contractual clauses that public sector bodies can use in their transfer contracts with re-users.

2. Where a large number of requests for re-use from specific countries justify it, adopt "equivalence decisions" designating these third countries as providing a level of protection for trade secrets or intellectual property that can be considered equivalent to that provided for in the EU.

3. Adopt the conditions to be applied to transfers of highly sensitive non-personal data, such as health data. In cases where the transfer of such data to third countries poses a risk to EU public policy objectives (in this example, public health) and in order to assist public sector bodies granting permissions for re-use, the Commission will set additional conditions to be met before such data can be transferred to a third country.

▪ Public sector bodies may charge fees for allowing re-use. The DGA's strategy aims at sustainability of the system, as fees should only cover the costs of making data available for re-use, such as the costs of anonymisation or providing a secure processing environment. This would include the costs of processing requests for re-use. Member States must publish a description of the main cost categories and the rules used for their allocation.

▪ Natural or legal persons directly affected by a decision on re-use taken by a public sector body shall have the right to lodge a complaint or to seek a judicial remedy in the Member State of that public sector body.

Organisational support

It is entirely possible that public sector bodies offering intermediation services will multiply. This is a complex environment that will require technical and legal support, backstopping and coordination.

To this end, Member States should designate one or more competent bodies whose role is to support public sector bodies granting re-use. The competent bodies shall have adequate legal, financial, technical and human resources to carry out the tasks assigned to them, including the necessary expertise. They are not supervisory bodies, they do not exercise public powers and, as such, the DGA does not set specific requirements as to their status or legal form. In addition, the competent body may be given a mandate to allow re-use itself.

Finally, States must create a Single Point of Information or one-stop shop. This Point will be responsible for transmitting queries and requests to relevant public sector bodies and for maintaining an asset list with an overview of available data resources (metadata). The single information point may be linked to local, regional or sectoral information points where they exist. At EU level, the Commission created the European Register of Protected Data held by the Public Sector (ERPD), a searchable register of information collected by national single points of information to further facilitate the re-use of data in the internal market and beyond.

EU regulations are rules that are complex to implement. Therefore, a special pro-activity is required to contribute to its correct understanding and implementation. The EU Guide to the Deployment of the Data Governance Act is a first tool for this purpose and will allow a better understanding of the objectives and possibilities offered by the DGA.

Content prepared by Ricard Martínez Martínez, Director of the Chair in Privacy and Digital Transformation, Department of Constitutional Law of the Universitat de València. The contents and points of view reflected in this publication are the sole responsibility of its author.

29/01/2025

Podcast: artificial intelligence and data (new challenges and legal context)

Entrevista

In this episode we will discuss artificial intelligence and its challenges, based on the European Regulation on Artificial Intelligence that entered into force this year. Come and find out about the challenges, opportunities and new developments in the sector from two experts in the field:

Ricard Martínez, professor of constitutional law at the Universitat de València where he directs the Chair of Privacy and Digital Transformation Microsoft University of Valencia.
Carmen Torrijos, computational linguist, expert in AI applied to language and professor of text mining at the Carlos III University.

Listen to the full podcast (only available in Spanish)

Summary of the interview

1. It is a fact that artificial intelligence is constantly evolving. To get into the subject, I would like to hear about the latest developments in AI?

Carmen Torrijos: Many new applications are emerging. For example, this past weekend there has been a lot of buzz about an AI for image generation in X (Twitter), I don't know if you've been following it, called Grok. It has had quite an impact, not because it brings anything new, as image generation is something we have been doing since December 2023. But this is an AI that has less censorship, that is, until now we had a lot of difficulties with the generalist systems to make images that had faces of celebrities or had certain situations and it was very monitored from any tool. What Grok does is to lift all that up so that anyone can make any kind of image with any famous person or any well-known face. It is probably a passing fad. We will make images for a while and then it will pass.

And then there are also automatic podcast creation systems, such as Notebook LM. We've been watching them for a couple of months now and it's really been one of the things that has really surprised me in the last few months. Because it already seems that they are all incremental innovations: on top of what we already have, they give us something better. But this is something really new and surprising. You upload a PDF and it can generate a podcast of two people talking in a totally natural, totally realistic way about that PDF. This is something that Notebook LM, which is owned by Google, can do.

2. The European Regulation on Artificial Intelligence is the world's first legal regulation on AI, with what objectives is this document, which is already a reference framework at international level, being published?

Ricard Martínez: The regulation arises from something that is implicit in what Carmen has told us. All this that Carmen tells is because we have opened ourselves up to the same unbridled race that we experienced with the emergence of social media. Because when this happens, it's not innocent, it's not that companies are being generous, it's that companies are competing for our data. They gamify us, they encourage us to play, they encourage us to provide them with information, so they open up. They do not open up because they are generous, they do not open up because they want to work for the common good or for humanity. They open up because we are doing their work for them. What does the EU want to stop? What we learned from social media. The European Union has two main approaches, which I will try to explain very succinctly. The first approach is a systemic risk approach. The European Union has said: "I will not tolerate artificial intelligence tools that may endanger the democratic system, i.e. the rule of law and the way I operate, or that may seriously infringe fundamental rights". That is a red line.

The second approach is a product-oriented approach. An AI is a product. When you make a car, you follow rules that manage how you produce that car, and that car comes to market when it is safe, when it has all the specifications. This is the second major focus of the Regulation. The regulation says that you can be developing a technology because you are doing research and I almost let you do whatever you want. Now, if this technology is to come to market, you will catalogue the risk. If the risk is low or slight, you are going to be able to do a lot of things and, practically speaking, with transparency and codes of conduct, I will give you a pass. But if it's a high risk, you're going to have to follow a standardised design process, and you're going to need a notified body to verify that technology, make sure that in your documentation you've met what you have to meet, and then they'll give you a CE mark. And that's not the end of it, because there will be post-trade surveillance. So, throughout the life cycle of the product, you need to ensure that this works well and that it conforms to the standard.

On the other hand, a tight control is established with regard to big data models, not only LLM, but also image or other types of information, where it is believed that they may pose systemic risks.

In that case, there is a very direct control by the Commission. So, in essence, what they are saying is: "respect rights, guarantee democracy, produce technology in an orderly manner according to certain specifications".

Carmen Torrijos: Yes, in terms of objectives it is clear. I have taken up Ricard's last point about producing technology in accordance with this Regulation. We have this mantra that the US does things, Europe regulates things and China copies things. I don't like to generalise like that. But it is true that Europe is a pioneer in terms of legislation and we would be much stronger if we could produce technology in line with the regulatory standards we are setting. Today we still can't, maybe it's a question of giving ourselves time, but I think that is the key to technological sovereignty in Europe.

3. In order to produce such technology, AI systems need data to train their models. What criteria should the data meet in order to train an AI system correctly? Could open data sets be a source? In what way?

Carmen Torrijos: The data we feed AI with is the point of greatest conflict. Can we train with any dataset even if it is available? We are not talking about open data, but about available data.

Open data is, for example, the basis of all language models, and everyone knows this, which is Wikipedia. Wikipedia is an ideal example for training, because it is open, it is optimised for computational use, it is downloadable, it is very easy to use, there is a lot of language, for example, for training language models, and there is a lot of knowledge of the world. This makes it the ideal dataset for training an AI model. And Wikipedia is in the open, it is available, it belongs to everyone and it is for everyone, you can use it.

But can all the datasets available on the Internet be used to train AI systems? That is a bit of a doubt. Because the fact that something is published on the Internet does not mean that it is public, for public use, although you can take it and train a system and start generating profit from that system. It had a copyright, authorship and intellectual property. That I think is the most serious conflict we have right now in generative AI because it uses content to inspire and create. And there, little by little, Europe is taking small steps. For example, the Ministry of Culture has launched an initiative to start looking at how we can create content, licensed datasets, to train AI in a way that is legal, ethical and respectful of authors' intellectual property rights.

All this is generating a lot of friction. Because if we go on like this, we will turn against many illustrators, translators, writers, etc. (all creators who work with content), because they will not want this technology to be developed at the expense of their content. Somehow you have to find a balance in regulation and innovation to make both happen. From the large technological systems that are being developed, especially in the United States, there is a repeated idea that only with licensed content, with legal datasets that are free of intellectual property, or that the necessary returns have been paid for their intellectual property, it is not possible to reach the level of quality of AIs that we have now. That is, only with legal datasets alone we would not have ChatGPT at the level ChatGPT is at now.

This is not set in stone and does not have to be the case. We have to continue researching, that is, we have to continue to see how we can achieve a technology of that level, but one that complies with the regulation. Because what they have done in the United States, what GPT-4 has done, the great models of language, the great models of image generation, is to show us the way. This is as far as we can go. But you have done so by taking content that is not yours, that it was not permissible to take. We have to get back to that level of quality, back to that level of performance of the models, respecting the intellectual property of the content. And that is a role that I believe is primarily Europe's responsibility.

4. Another issue of public concern with regard to the rapid development of AI is the processing of personal data. How should they be protected and what conditions does the European regulation set for this?

Ricard Martínez: There is a set of conducts that have been prohibited essentially to guarantee the fundamental rights of individuals. But it is not the only measure. I attach a great deal of importance to an article in the regulation that we are probably not going to give much thought to, but for me it is key. There is an article, the fourth one, entitled AI Literacy, which says that any subject that is intervening in the value chain must have been adequately trained. You have to know what this is about, you have to know what the state of the art is, you have to know what the implications are of the technology you are going to develop or deploy. I attach great value to it because it means incorporating throughout the value chain (developer, marketer, importer, company deploying a model for use, etc.) a set of values that entail what is called accountability, proactive responsibility, by default. This can be translated into a very simple element, which has been talked about for two thousand years in the world of law, which is 'do no harm', the principle of non-maleficence.

With something as simple as that, "do no harm to others, act in good faith and guarantee your rights", there should be no perverse effects or harmful effects, which does not mean that it cannot happen. And this is precisely what the Regulation says in particular when it refers to high-risk systems, but it is applicable to all systems. The Regulation tells you that you have to ensure compliance processes and safeguards throughout the life cycle of the system. That is why it is so important to have robustness, resilience and contingency plans that allow you to revert, shut down, switch to human control, change the usage model when an incident occurs.

Therefore, the whole ecosystem is geared towards this objective of no harm, no rights, no harm. And there is an element that no longer depends on us, it depends on public policy. AI will not only infringe on rights, it will change the way we understand the world. If there are no public policies in the education sector that ensure that our children develop computational thinking skills and are able to have a relationship with a machine-interface, their access to the labour market will be significantly affected. Similarly, if we do not ensure the continuous training of active workers and also the public policies of those sectors that are doomed to disappear.

Carmen Torrijos: I find Ricard's approach of to train is to protect very interesting. Train people, inform people, get people trained in AI, not only people in the value chain, but everybody. The more you train and empower, the more you are protecting people.

When the law came out, there was some disappointment in AI environments and especially in creative environments. Because we were in the midst of the generative AI boom and generative AI was hardly being regulated, but other things were being regulated that we took for granted would not happen in Europe, but that have to be regulated so that they cannot happen. For example, biometric surveillance: Amazon can't read your face to decide whether you are sadder that day and sell you more stuff or get more advertising or a particular advertisement. I say Amazon, but it can be any platform. This, for example, will not be possible in Europe because it is forbidden by law, it is an unacceptable use: biometric surveillance.

Another example is social scoring, the social scoring that we see happening in China, where citizens are given points and access to public services based on these points. That is not going to be possible either. And this part of the law must also be considered, because we take it for granted that this is not going to happen to us, but when you don't regulate it, that's when it happens. China has installed 600 million TRF cameras, facial recognition technology, which recognise you with your ID card. That is not going to happen in Europe because it cannot, because it is also biometric surveillance. So you have to understand that the law perhaps seems to be slowing down on what we are now enraptured by, which is generative AI, but it has been dedicated to addressing very important points that needed to be covered in order to protect people. In order not to lose fundamental rights that we have already won.

Finally, ethics has a very uncomfortable component, which nobody wants to look at, which is that sometimes it has to be revoked. Sometimes it is necessary to remove something that is in operation, even that is providing a benefit, because it is incurring some kind of discrimination, or because it is bringing some kind of negative consequence that violates the rights of a collective, of a minority or of someone vulnerable. And that is very complicated. When we have become accustomed to having an AI operating in a certain context, which may even be a public context, to stop and say that this is discriminating against people, then this system cannot continue in production and has to be removed. This point is very complicated, it is very uncomfortable and when we talk about ethics, which we talk very easily about ethics, we must also think about how many systems we are going to have to stop and review before we can put them back into operation, however easy they make our lives or however innovative they may seem.

5. In this sense, taking into account all that the Regulation contains, some Spanish companies, for example, will have to adapt to this new framework. What should organisations already be doing to prepare? What should Spanish companies review in the light of the European regulation?

Ricard Martínez: This is very important, because there is a corporate business level of high capabilities that I am not worried about because these companies understand that we are talking about an investment. And just as they invested in a process-based model that integrated the compliance from the design for data protection. The next leap, which is to do exactly the same thing with artificial intelligence, I won't say that it is unimportant, because it is of relevant importance, but let's say that it is going down a path that has already been tried. These companies already have compliance units, they already have advisors, and they already have routines into which the artificial intelligence regulatory framework can be integrated as part of the process. In the end, what it will do is to increase risk analysis in one sense. It will surely force the design processes and also the design phases themselves to be modular, i.e., while in software design we are practically talking about going from a non-functional model to chopping up code, here there are a series of tasks of enrichment, annotation, validation of the data sets, prototyping that surely require more effort, but they are routines that can be standardised.

My experience in European projects where we have worked with clients, i.e. SMEs, who expect AI to be plug and play, what we have seen is a huge lack of capacity building. The first question you should ask yourself is not whether your company needs AI, but whether your company is ready for AI. This is an earlier and rather more relevant question. Hey, you think you can make a leap into AI, that you can contract a certain type of services, and we are realising that you don't even comply with the data protection regulation.

There is something, an entity called the Spanish Agency for Artificial Intelligence, AESIA, and there is a Ministry of Digital Transformation, and if there are no accompanying public policies, we may incur risky situations. Why? Because I have the great pleasure of training future entrepreneurs in artificial intelligence in undergraduate and postgraduate courses. When confronted with the ethical and legal framework, I won't say they want to die, but the world comes crashing down on them. Because there is no support, there is no accompaniment, there are no resources, or they cannot see them, that do not involve a round of investment that they cannot bear, or there are no guided models that help them in a way that is, I won't say easy, but at least usable.

Therefore, I believe that there is a substantial challenge in public policies, because if this combination does not happen, the only companies that will be able to compete are those that already have a critical mass, an investment capacity and an accumulated capital that allows them to comply with the standard. This situation could lead to a counterproductive outcome.

We want to regain European digital sovereignty, but if there are no public investment policies, the only ones who will be able to comply with the European standard are companies from other countries.

Carmen Torrijos: Not because they are from other countries but because they are bigger.

Ricard Martínez: Yes, not to mention countries.

6. We have talked about challenges, but it is also important to highlight opportunities. What positive aspects could you highlight as a result of this recent regulation?

Ricard Martínez: I am working on the construction, with European funding, of Cancer Image EU , which is intended to be a digital infrastructure for cancer imaging. At the moment, we are talking about a partnership involving 14 countries, 76 organisations, on the way to 93, to generate a medical imaging database of 25 million cancer images with associated clinical information for the development of artificial intelligence. The infrastructure is being built, it does not yet exist, and even so, at the Hospital La Fe in Valencia, research is already underway with mammograms of women who have undergone biennial screening and then developed cancer, to see if it is capable of training an image analysis model that is capable of preventively recognising that little spot that the oncologist or radiologist did not see and that later turned out to be a cancer. Does it mean you're getting chemotherapy five minutes later? No. It means they are going to monitor you, they are going to have an early reaction capability. And that the health system will save 200,000 euros. To mention just one opportunity.

On the other hand, opportunities must also be sought in other rules. Not only in the Artificial Intelligence Regulation. You have to go to Data Governance Act. It wants to counter the data monopoly held by US companies with a sharing of data from the public, private sectorand from the citizenry itself. With Data Act, which aims to empower citizens to retrieve their data and share it by consent. And finally with the European Health Data Space which aims to create ahealth data ecosystem to promote innovation, research and entrepreneurship. It is this ecosystem of data spaces that should be a huge generator of opportunity spaces.

And furthermore, I don't know whether they will succeed or not, but it aims to be coherent with our business ecosystem. That is to say, an ecosystem of small and medium-sized enterprises that does not have high data generation capabilities and what we are going to do is to build the field for them. We are going to create the data spaces for them, we are going to create the intermediaries, the intermediation services, and we hope that this ecosystem as a whole will allow European talent to emerge from small and medium-sized enterprises. Will it be achieved or not? I don't know, but the opportunity scenario looks very interesting.

Carmen Torrijos: If you ask for opportunities, all opportunities. Not only artificial intelligence, but all technological progress, is such a huge field that it can bring opportunities of all kinds. What needs to be done is to lower the barriers, which is the problem we have. And we also have barriers of many kinds, because we have technical barriers, talent barriers, salary barriers, disciplinary barriers, gender barriers, generational barriers, and so on.

We need to focus energies on lowering those barriers, and then I also think we still come from the analogue world and have little global awareness that both digital and everything that affects AI and data is a global phenomenon. There is no point in keeping it all local, or national, or even European, but it is a global phenomenon. The big problems we have come because we have technology companies that are developed in the United States working in Europe with European citizens' data. A lot of friction is generated there. Anything that can lead to something more global will always be in favour of innovation and will always be in favour of technology. The first thing is to lift the barriers within Europe. That is a very positive part of the law.

7. At this point, we would like to take a look at the state we are in and the prospects for the future. How do you see the future of artificial intelligence in Europe?

Ricard Martínez: I have two visions: one positive and one negative. And both come from my experience in data protection. If now that we have a regulatory framework, the regulatory authorities, I am referring to artificial intelligence and data protection, are not capable of finding functional and grounded solutions, and they generate public policies from the top down and from an excellence that does not correspond to the capacities and possibilities of research - I am referring not only to business research, but also to university research - I see a very dark future. If, on the other hand, we understand regulation in a dynamic way with supportive and accompanying public policies that generate the capacities for this excellence, I see a promising future because in principle what we will do is compete in the market with the same solutions as others, but responsive: safe, responsible and reliable.

Carmen: Yes, I very much agree. I introduce the time variable into that, don't I? Because I think we have to be very careful not to create more inequality than we already have. More inequality among companies, more inequality among citizens. If we are careful with this, which is easy to say but difficult to do, I believe that the future can be bright, but it will not be bright immediately. In other words, we are going to have to go through a darker period of adapting to change. Just as many issues of digitalisation are no longer alien to us, have already been worked on, we have already gone through them and have already regulated them, artificial intelligence also needs its time.

We have had very few years of AI, very few years of generative AI. In fact, two years is nothing in a worldwide technological change. And we have to give time to laws and we also have to give time for things to happen. For example, I give a very obvious example, the New York Times' complaint against Microsoft and OpenAI has not yet been resolved. It's been a year, it was filed in December 2023, the New York Times complains that they have trained AI systems with their content and in a year nothing has been achieved in that process. Court proceedings are very slow. We need more to happen. And that more processes of this type are resolved in order to have precedents and to have maturity as a society in what is happening, and we still have a long way to go. It's like almost nothing has happened. So, the time variable I think is important and I think that, although at the beginning we have a darker future, as Ricard says, I think that in the long term, if we keep clear limits, we can reach something brilliant.

Interview clips

Clip 1. What criteria should the data have to train an AI system?

Clip 2. What should Spanish companies review in light of the IA Regulation?

28/01/2025

Podcast: data governance

Entrevista

This episode focuses on data governance and why it is important to have standards, policies and processes in place to ensure that data is correct, reliable, secure and useful. For this purpose, we analyze the Model Ordinance on Data Governance of the Spanish Federation of Municipalities and Provinces, known as the FEMP, and its application in a public body such as the City Council of Zaragoza. This will be done by the following guests:

Roberto Magro Pedroviejo, Coordinator of the Open Data Working Group of the Network of Local Entities for Transparency and Citizen Participation of the Spanish Federation of Municipalities and Provinces and civil servant of the Alcobendas City Council.

María Jesús Fernández Ruiz, Head of the Technical Office of Transparency and Open Government of Zaragoza City Council.

Listen to the full podcast (only available in Spanish)

Summary of the interview

1. What is data governance?

Roberto Magro Pedroviejo: We, in the field of Public Administrations, define data governance as an organisational and technical mechanism that comprehensively addresses issues related to the use of data in our organisation. It covers the entire data lifecycle, i.e. from creation to archiving or even, if necessary, purging and destruction. Its purpose is that data is of quality and available to all those who need it: sometimes it will be only the organisation itself internally, but many other times it will be the general public, re-users, the university environment, etc. Data governance must facilitate the right of access to data. In short, data governance makes it possible to respond to the objective of managing our administration effectively and efficiently and achieving greater interoperability between all administrations.

2. Why is this concept important for a municipality?

María Jesús Fernández Ruiz: Because we have found that, within organisations, both public and private, data collection and management is often carried out without following homogeneous criteria, standards or appropriate techniques. This translates into a difficult and costly situation, which is exacerbated when we try to develop a data space or develop data-related services. Therefore, we need an umbrella that obliges us to manage data, as Roberto has said, effectively and efficiently, following homogeneous standards and criteria, which facilitates interoperability.

3. To meet this challenge, it is necessary to establish a set of guidelines to help local administrations set up a legal framework. For this reason, the FEMP Model Ordinance on Data Governance has been created. What was the process of developing this reference document like?

Roberto Magro Pedroviejo: Within the Open Data Network Group that was created back in 2017, one of the people we have counted on and who has contributed a lot of ideas has been María Jesús, from Zaragoza City Council. We were leaving COVID, just in March 2021, and I remember perfectly the meeting we had in a room lent to us by the Madrid City Council in the Cibeles Palace. María Jesús was in Zaragoza and joined the meeting by videoconference. On that day, seeing what things and what work we could tackle within this multidisciplinary group, María Jesús proposed creating a model ordinance. The FEMP and the Network already had experience in creating model ordinances to try to improve, and above all help, municipalities and local entities or councils to create regulations.

We started working as a multidisciplinary team, led by José Félix Muñoz Soro, from the University of Zaragoza, who is the person who has coordinated the regulatory text that we have published. And a few months later, in January 2022 to be precise, we held a meeting. We met in person at the Zaragoza City Council and there we began to establish the basis for the model ordinance, what type of articles it should have, what type of structure it should have, etc. And we got together a multidisciplinary team, as we said, which included experts in data governance and jurists from the University of Zaragoza, staff from the Polytechnic University of Madrid, colleagues from the Polytechnic University of Valencia, professionals from the local public sphere and journalists who are experts in open data.

The first draft was published in May/June 2022. In addition, it was made available for public consultation through Zaragoza City Council's Citizen Participation platform. We contacted around 100 national experts and received around 30 contributions of improvements, most of which were included, and which allowed us to have the final text by the end of last year, which was passed to the legal department of the FEMP to validate it. The regulations were published in February 2024 and are now available on the Network's website for free download.

I would like to take this opportunity to thank the excellent work done by all the people involved in the team who, from their respective points of view, have worked selflessly to create this knowledge and share it with all the Spanish public administrations.

4. What are the expected benefits of the ordinance?

María Jesús Fernández Ruiz: For me, one of the main objectives of the ordinance, and I think it is a great instrument, is that it takes the whole life cycle of the data. It covers from the moment the data is generated, how the data is managed, how the data is provided, how the documentation associated with the data must be stored, how the historical data must be stored, etc. The most important thing is that it establishes criteria for managing the data while respecting its entire life cycle.

The ordinance also establishes some principles, which are not many, but which are very important and which set the tone, which speak, for example, of effective data governance and describe the importance of establishing processes when generating the data, managing the data, providing the data, etc.

Another very important principle, which has been mentioned by Roberto, is the ethical treatment of data. In other words, the importance of collecting data traceability, of seeing where the data is moving and of respecting the rights of natural and legal persons.

Another very important principle that generates a lot of noise in the institutions is that data must be managed from the design phase, the management of data by default. Often, when we start working on data with openness criteria, we are already in the middle or near the end of the data lifecycle. We have to design data management from the beginning, from the source. This saves us a lot of resources, both human and financial.

Another important issue for us and one that we advocate within the ordinance is that administration has to be data-oriented. It has to be an administration that is going to design its policies based on evidence. An administration that will consider data as a strategic asset and will therefore provide the necessary resources.

And another issue, which we often discuss with Roberto, is the importance of data culture. When we work on and publish data, data that is interoperable, that is easy to reuse, that is understood, etc., we cannot stop there, but we must talk about the data culture, which is also included in the ordinance. It is important that we disseminate what is data, what is quality data, how to access data, how to use data. In other words, every time we publish a dataset, we must consider actions related to data culture.

5. Zaragoza City Council has been a pioneer in the application of this ordinance. What has this implementation process been like and what challenges are you facing?

María Jesús Fernández Ruiz: This challenge has been very interesting and has also helped us to improve. It was very fast at the beginning and already in June we were going to present the ordinance to the city government. There is a process where the different parties make private votes on the ordinance and say "this point I like", "this point seems more interesting", "this one should be modified", etc. Our surprise is that we have had more than 50 private votes on the ordinance, after having gone through the public consultation process and having appeared in all the media, which was also enriching, and we have had to respond to these votes. The truth is that it has helped us to improve and, at the moment, we are waiting for it to go to government.

When they tell me how do you feel, María Jesús? The answer is well, we are making progress, because thanks to this ordinance, which is pending approval by the Zaragoza City Council government, we have already issued a series of contracts. One that is extremely important for us: to draw up an inventory of data and information sources in our institution, which we believe is the basic instrument for managing data, knowing what data we have, where they originate, what traceability they have, etc. Therefore, we have not stopped. Thanks to this framework that has not yet been approved, we have been able to make progress on the basis of contracts or something that is basic in an institution: the definition of the professionals who have to participate in data management.

6. You mentioned the need to develop an inventory of datasets and information sources, what kind of datasets are we talking about and what descriptive information should be included for each?

Roberto Magro Pedroviejo: There is a core, let's say a central core, with a series of datasets that we recommend in the ordinance itself, referring to other work done in the open data group, which is to recommend 80 datasets that we could publish in Spanish public administrations. The focus is also on high-value datasets, those that can most benefit municipal management or can benefit by providing social and economic value to the general public and to the business community and reusers. Any administration that wants to start working on the issue of datasets and wonders where to start publishing or managing data has to focus, in my view, on three key areas in a city:

The personal data, i.e. our beloved census: who are the people living in our city, their ages, gender, postal addresses, etc.
The urban and territorial data, that is, where these people live, what the territorial delimitation of the municipality is, etc. Everything that has to do with these sets of data related to streets, roads, even sewerage, public roads or lighting, needs to be inventoried, to know where these data are and to have them, as we have already said, updated, structured, accessible, etc.
And finally, everything that has to do with how the city is managed, of course, with the tax and budget area.

That is: the personal sphere, the territorial sphere and the taxation sphere. That is what we recommend to start with. And in the end, this inventory of datasets describes what they are, where they are, how they are and will be the first basis on which to start building data governance.

María Jesús Fernández Ruiz: Another issue that is also very fundamental, which is included in the ordinance, is to define the master datasets. Just a little anecdote. When creating a spatial data space, the street map, the base cartography and the portal holder are basic. When we got together to work, a technical commission was set up and we considered these to be master datasets for Zaragoza City Council. The quality of the data is determined by a concept in the ordinance, which is respecting the sovereignty of the data: whoever creates the data is the sovereign of the data and is responsible for the quality of the data. Sovereignty must be respected and that determines quality.

We then discovered that, in Zaragoza City Council, we had five different portal identifiers. To improve this situation, we define a descriptive unique identifier which we declare as master data. In this way, all municipal entities will use the same identifier, the same street map, the same cartography, etc. and this will make all services related to the city interoperable.

7. What additional improvements do you think could be included in future revisions of the ordinance?

Roberto Magro Pedroviejo: The ordinance itself, being a regulatory instrument, is adapted to current Spanish and European regulations. In other words, we will have to be very vigilant -we are already - to everything that is being published on artificial intelligence, data spaces and open data. The ordinance will have to be adapted because it is a regulatory framework to comply with current legislation, but if that regulatory framework changes, we will make the appropriate modifications for compliance.

I would also like to highlight two things. There have been more town councils and a university, specifically the Town Council of San Feliu de Llobregat and the University of La Laguna, interested in the ordinance. We have received more requests to know a little more about the ordinance, but the bravest have been the Zaragoza City Council, who were the ones who proposed it and are the ones who are suffering the process of publication and final approval. From this experience that Zaragoza City Council itself is gaining, we will surely all learn, about how to tackle it in each of the administrations, because we copy each other and we can go faster. I believe that, little by little, once Zaragoza publishes the ordinance, other city councils and other institutions will join in. Firstly, because it helps to organise the inside of the house. Now that we are in a process of digital transformation that is not fast, but rather a long process, this type of ordinance will help us, above all, to organise the data we have in the administration. Data and the management of data governance will help us to improve public management within the organisation itself, but above all in terms of the services provided to citizens.

And the last thing I wanted to emphasise, which is also very important, is that, if the data is not of high quality, is not updated and is not metadata-driven, we will do little or nothing in the administration from the point of view of artificial intelligence, because artificial intelligence will be based on the data we have and if it is not correct or updated, the results and predictions that AI can make will be of no use to us in the public administration.

María Jesús Fernández Ruiz: What Roberto has just said about artificial intelligence and quality data is very important. And I would like to add two things that we are learning in implementing this ordinance. Firstly, the need to define processes, i.e. efficient data management has to be based on processes. And another thing that I think we should talk about, and we will talk about within the FEMP, is the importance of defining the roles of the different professionals involved in data management. We are talking about data manager, data provider, technology provider, etc. If I had the ordinance now, I would talk about that definition of the roles that have to be involved in efficient data management. That is, processes and professionals.

Interview clips

Clip 1. What is data governance?

Clip 2. What is the FEMP Model Ordinance on Data Governance?

05/12/2024

Legislation and justice

Sectores

04/12/2024