datos abiertos

Castilla y León launches the IX edition of its open data contest

Evento

Once again, the Junta de Castilla y León has launched its open data contest to reward the innovative use of public information.

In this post, we summarize the details to participate in the IX edition of this event, which is an opportunity for both professionals and students, creative people or multidisciplinary teams who wish to give visibility to their talent through the reuse of public data.

What does the competition consist of?

The aim of the competition is to recognize projects that use open datasets from the Junta de Castilla y León. These datasets can be combined, if the participants wish, with other public or private sources, at any level of administration.

Projects can be submitted in four categories:

Ideas category: aimed at people or teams who want to submit a proposal to create a service, studio, application, website or any other type of development. The project does not need to be completed; the important thing is that the idea is original, viable and has a potential positive impact.
Products and services category: designed for projects already developed and accessible to citizens, such as online services, mobile applications or websites. All developments must be available via a public URL. This category includes a specific award for students enrolled in official education during the 2024/2025 or 2025/2026 school years.
Didactic resource category: aimed at educational projects that use open data as a support tool in the classroom. The aim is to promote innovative teaching through Creative Commons licensed resources, which can be shared and reused by teachers and students.
Data journalism category: it will reward journalistic works published or updated in a relevant way, in written or audiovisual format, that make use of open data to inform, contextualize or analyze topics of interest to citizens. The journalistic pieces must have been published in a printed or digital media since September 24, 2024, the day following the end date of the deadline for submission of candidacies of the immediately preceding call for awards.

In all categories, it is essential that at least one dataset from the open data portal of the Junta de Castilla y León is used. This platform has hundreds of datasets on different sectors such as the environment, economy, society, public administration, culture, education, etc. that can be used as a basis to develop useful, informative and transformative ideas.

Who can participate?

The competition is open to any natural or legal person, who can be presented individually or in a group. In addition, you can submit more than one application even for different categories. Although the same project may not receive more than one award, this flexibility allows the same idea to be explored from different approaches: educational, journalistic, technical or conceptual.

What prizes are awarded?

The 2025 edition of the contest includes prizes with a financial endowment, accrediting diploma and institutional dissemination through the open data portal and other communication channels of the Board.

The distribution and amount of the prizes by category is:

Ideas category
- First prize: €1,500
- Second prize: €500
Category products & services
- First prize: €2,500
- Second prize: €1,500
- Third prize: €500
- Special Student Prize: €1,500
Category teaching resource
- First prize: €1,500
Data journalism category
- First prize: €1,500
- Second prize: €1,000

Under what criteria are the prizes awarded? The jury will assess the candidatures considering different evaluation criteria, as set out in the rules and the order of call, including their originality, social utility, technical quality, feasibility, impact, economic value and degree of innovation.

How to participate?

As in other editions, candidacies can be submitted in two ways:

In person, at the General Registry of the Ministry of the Presidency, at the registry assistance offices of the Junta de Castilla y León or at the places established in article 16.4 of Law 39/2015.
Electronics, through the electronic headquarters of the Junta de Castilla y León

Each application must include:

Identification data of the author(s).
Title of the project.
Category or categories to which it is submitted.
An explanatory report of the project, with a maximum length of 1,000 words, providing all the information that can be assessed by the jury according to the established scale.
In the case of submitting an application to the Products and Services category, the URL to access the project will be specified

The deadline to submit proposals is September 22, 2025

With this contest, the Junta de Castilla y León reaffirms its commitment to the open data policy and the culture of reuse. The competition not only recognizes the creativity, innovation and usefulness of the projects presented, but also contributes to disseminating the transformative potential of open data in areas such as education, journalism, technology or social entrepreneurship.

In previous editions, solutions to improve mobility, interactive maps on forest fires, tools for the analysis of public expenditure or educational resources on the rural environment, among many other examples, have been awarded. You can read more about last year's winning proposals and others on our website. In addition, all these projects can be consulted in the history of winners available on the community's open data portal.

We encourage you to participate in the contest and get the most out of open data in Castilla y León!

04/08/2025

AI Data Readiness: Preparing Data for Artificial Intelligence

Blog

Over the last few years we have seen spectacular advances in the use of artificial intelligence (AI) and, behind all these achievements, we will always find the same common ingredient: data. An illustrative example known to everyone is that of the language models used by OpenAI for its famous ChatGPT, such as GPT-3, one of its first models that was trained with more than 45 terabytes of data, conveniently organized and structured to be useful.

Without sufficient availability of quality and properly prepared data, even the most advanced algorithms will not be of much use, neither socially nor economically. In fact, Gartner estimates that more than 40% of emerging AI agent projects today will end up being abandoned in the medium term due to a lack of adequate data and other quality issues. Therefore, the effort invested in standardizing, cleaning, and documenting data can make the difference between a successful AI initiative and a failed experiment. In short, the classic principle of "garbage in, garbage out" in computer engineering applied this time to artificial intelligence: if we feed an AI with low-quality data, its results will be equally poor and unreliable.

Becoming aware of this problem arises the concept of "AI Data Readiness" or preparation of data to be used by artificial intelligence. In this article, we'll explore what it means for data to be "AI-ready", why it's important, and what we'll need for AI algorithms to be able to leverage our data effectively. This results in greater social value, favoring the elimination of biases and the promotion of equity.

What does it mean for data to be "AI-ready"?

Having AI-ready data means that this data meets a series of technical, structural, and quality requirements that optimize its use by artificial intelligence algorithms. This includes multiple aspects such as the completeness of the data, the absence of errors and inconsistencies, the use of appropriate formats, metadata and homogeneous structures, as well as providing the necessary context to be able to verify that they are aligned with the use that AI will give them.

Preparing data for AI often requires a multi-stage process. For example, again the consulting firm Gartner recommends following the following steps:

Assess data needs according to the use case: identify which data is relevant to the problem we want to solve with AI (the type of data, volume needed, level of detail, etc.), understanding that this assessment can be an iterative process that is refined as the AI project progresses.
Align business areas and get management support: present data requirements to managers based on identified needs and get their backing, thus securing the resources required to prepare the data properly.
Develop good data governance practices: implement appropriate data management policies and tools (quality, catalogs, data lineage, security, etc.) and ensure that they also incorporate the needs of AI projects.
Expand the data ecosystem: integrate new data sources, break down potential barriers and silos that are working in isolation within the organization and adapt the infrastructure to be able to handle the large volumes and variety of data necessary for the proper functioning of AI.
Ensure scalability and regulatory compliance: ensure that data management can scale as AI projects grow, while maintaining a robust governance framework in line with the necessary ethical protocols and compliance with existing regulations.

If we follow a strategy like this one, we will be able to integrate the new requirements and needs of AI into our usual data governance practices. In essence, it is simply a matter of ensuring that our data is prepared to feed AI models with the minimum possible friction, avoiding possible setbacks later in the day during the development of projects.

Open data "ready for AI"

In the field of open science and open data, the FAIR principles have been promoted for years. These acronyms state that data must be locatable, accessible, interoperable and reusable. The FAIR principles have served to guide the management of scientific and open data to make them more useful and improve their use by the scientific community and society at large. However, these principles were not designed to address the new needs associated with the rise of AI.

Therefore, the proposal is currently being made to extend the original principles by adding a fifth readiness principle for AI, thus moving from the initial FAIR to FAIR-R or FAIR². The aim would be precisely to make explicit those additional attributes that make the data ready to accelerate its responsible and transparent use as a necessary tool for AI applications of high public interest

FAIR-R Principles: Findable, Accessible, Interoperable, Reusable, Readness. Source: own elaboration - datos.gob.es

What exactly would this new R add to the FAIR principles? In essence, it emphasizes some aspects such as:

Labelling, annotation and adequate enrichment of data.
Transparency on the origin, lineage and processing of data.
Standards, metadata, schemas and formats optimal for use by AI.
Sufficient coverage and quality to avoid bias or lack of representativeness.

In the context of open data, this discussion is especially relevant within the discourse of the "fourth wave" of the open data movement, through which it is argued that if governments, universities and other institutions release their data, but it is not in the optimal conditions to be able to feed the algorithms, A unique opportunity for a whole new universe of innovation and social impact would be missing: improvements in medical diagnostics, detection of epidemiological outbreaks, optimization of urban traffic and transport routes, maximization of crop yields or prevention of deforestation are just a few examples of the possible lost opportunities.

And if not, we could also enter a long "data winter", where positive AI applications are constrained by poor-quality, inaccessible, or biased datasets. In that scenario, the promise of AI for the common good would be frozen, unable to evolve due to a lack of adequate raw material, while AI applications led by initiatives with private interests would continue to advance and increase unequal access to the benefit provided by technologies.

Conclusion: the path to quality, inclusive AI with true social value

We can never take for granted the quality or suitability of data for new AI applications: we must continue to evaluate it, work on it and carry out its governance in a rigorous and effective way in the same way as it has been recommended for other applications. Making our data AI-ready is therefore not a trivial task, but the long-term benefits are clear: more accurate algorithms, reduced unwanted bias, increased transparency of AI, and extended its benefits to more areas in an equitable way.

Conversely, ignoring data preparation carries a high risk of failed AI projects, erroneous conclusions, or exclusion of those who do not have access to quality data. Addressing the unfinished business on how to prepare and share data responsibly is essential to unlocking the full potential of AI-driven innovation for the common good. If quality data is the foundation for the promise of more humane and equitable AI, let's make sure we build a strong enough foundation to be able to reach our goal.

On this path towards a more inclusive artificial intelligence, fuelled by quality data and with real social value, the European Union is also making steady progress. Through initiatives such as its Data Union strategy, the creation of common data spaces in key sectors such as health, mobility or agriculture, and the promotion of the so-called AI Continent and AI factories, Europe seeks to build a digital infrastructure where data is governed responsibly, interoperable and prepared to be used by AI systems for the benefit of the common good. This vision not only promotes greater digital sovereignty but reinforces the principle that public data should be used to develop technologies that serve people and not the other way around.

Content prepared by Carlos Iglesias, Open data Researcher and consultant, World Wide Web Foundation. The contents and views reflected in this publication are the sole responsibility of the author.

01/08/2025

Thinking Out Loud: Prompts to Simulate Human Reasoning with AI

Blog

In the usual search for tricks to make our prompts more effective, one of the most popular is the activation of the chain of thought. It consists of posing a multilevel problem and asking the AI system to solve it, but not by giving us the solution all at once, but by making visible step by step the logical line necessary to solve it. This feature is available in both paid and free AI systems, it's all about knowing how to activate it.

Originally, the reasoning string was one of many tests of semantic logic that developers put language models through. However, in 2022, Google Brain researchers demonstrated for the first time that providing examples of chained reasoning in the prompt could unlock greater problem-solving capabilities in models.

From this moment on, little by little, it has positioned itself as a useful technique to obtain better results from use, being very questioned at the same time from a technical point of view. Because what is really striking about this process is that language models do not think in a chain: they are only simulating human reasoning before us.

How to activate the reasoning chain

There are two possible ways to activate this process in the models: from a button provided by the tool itself, as in the case of DeepSeek with the "DeepThink" button that activates the R1 model:

Graphical User Interface, Application
AI-generated content may be incorrect.

Figure 1. DeepSeek with the "DeepThink" button that activates the R1 model.

Or, and this is the simplest and most common option, from the prompt itself. If we opt for this option, we can do it in two ways: only with the instruction (zero-shot prompting) or by providing solved examples (few-shot prompting).

Zero-shot prompting: as simple as adding at the end of the prompt an instruction such as "Reason step by step", or "Think before answering". This assures us that the chain of reasoning will be activated and we will see the logical process of the problem visible.

Graphical User Interface, Text, Application
AI-generated content may be incorrect.

Figure 2. Example of Zero-shot prompting.

Few-shot prompting: if we want a very precise response pattern, it may be interesting to provide some solved question-answer examples. The model sees this demonstration and imitates it as a pattern in a new question.

Text, Application, Letter
AI-generated content may be incorrect.

Figure 3. Example of Few-shot prompting.

Benefits and three practical examples

When we activate the chain of reasoning, we are asking the system to "show" its work in a visible way before our eyes, as if it were solving the problem on a blackboard. Although not completely eliminated, forcing the language model to express the logical steps reduces the possibility of errors, because the model focuses its attention on one step at a time. In addition, in the event of an error, it is much easier for the user of the system to detect it with the naked eye.

When is the chain of reasoning useful? Especially in mathematical calculations, logical problems, puzzles, ethical dilemmas or questions with different stages and jumps (called multi-hop). In the latter, it is practical, especially in those in which you have to handle information from the world that is not directly included in the question.

Let's see some examples in which we apply this technique to a chronological problem, a spatial problem and a probabilistic problem.

Chronological reasoning

Let's think about the following prompt:

If Juan was born in October and is 15 years old, how old was he in June of last year?

Graphical User Interface, Text, Application
AI-generated content may be incorrect.

Figure 5. Example of chronological reasoning.

For this example we have used the GPT-o3 model, available in the Plus version of ChatGPT and specialized in reasoning, so the chain of thought is activated as standard and it is not necessary to do it from the prompt. This model is programmed to give us the information of the time it has taken to solve the problem, in this case 6 seconds. Both the answer and the explanation are correct, and to arrive at them the model has had to incorporate external information such as the order of the months of the year, the knowledge of the current date to propose the temporal anchorage, or the idea that age changes in the month of the birthday, and not at the beginning of the year.

Spatial reasoning
A person is facing north. Turn 90 degrees to the right, then 180 degrees to the left. In what direction are you looking now?

Figure 6. Example of spatial reasoning.

This time we have used the free version of ChatGPT, which uses the GPT-4o model by default (although with limitations), so it is safer to activate the reasoning chain with an indication at the end of the prompt: Reason step by step. To solve this problem, the model needs general knowledge of the world that it has learned in training, such as the spatial orientation of the cardinal points, the degrees of rotation, laterality and the basic logic of movement.
Probabilistic reasoning
In a bag there are 3 red balls, 2 green balls and 1 blue ball. If you draw a ball at random without looking, what's the probability that it's neither red nor blue?

Figure 7. Example of probabilistic reasoning.

To launch this prompt we have used Gemini 2.5 Flash, in the Gemini Pro version of Google. The training of this model was certainly included in the fundamentals of both basic arithmetic and probability, but the most effective for the model to learn to solve this type of exercise are the millions of solved examples it has seen. Probability problems and their step-by-step solutions are the model to imitate when reconstructing this reasoning.

The Great Simulation

And now, let's go with the questioning. In recent months, the debate about whether or not we can trust these mock explanations has grown, especially since, ideally, the chain of thought should faithfully reflect the internal process by which the model arrives at its answer. And there is no practical guarantee that this will be the case.

The Anthropic team (creators of Claude, another great language model) has carried out a trap experiment with Claude Sonnet in 2025, to which they suggested a key clue for the solution before activating the reasoned response.

Think of it like passing a student a note that says "the answer is [A]" before an exam. If you write on your exam that you chose [A] at least in part because of the grade, that's good news: you're being honest and faithful. But if you write down what claims to be your reasoning process without mentioning the note, we might have a problem.

The percentage of times Claude Sonnet included the track among his deductions was only 25%. This shows that sometimes models generate explanations that sound convincing, but that do not correspond to their true internal logic to arrive at the solution, but are rationalizations a posteriori: first they find the solution, then they invent the process in a coherent way for the user. This shows the risk that the model may be hiding steps or relevant information for the resolution of the problem.

Closing

Despite the limitations exposed, as we see in the study mentioned above, we cannot forget that in the original Google Brain research, it was documented that, when applying the reasoning chain, the PaLM model improved its performance in mathematical problems from 17.9% to 58.1% accuracy. If, in addition, we combine this technique with the search in open data to obtain information external to the model, the reasoning improves in terms of being more verifiable, updated and robust.

However, by making language models "think out loud", what we are really improving in 100% of cases is the user experience in complex tasks. If we do not fall into the excessive delegation of thought to AI, our own cognitive process can benefit. It is also a technique that greatly facilitates our new work as supervisors of automatic processes.

Content prepared by Carmen Torrijos, expert in AI applied to language and communication. The contents and points of view reflected in this publication are the sole responsibility of the author.

08/07/2025

The role of data in driving autonomous vehicles

Blog

Just a few days ago, the Directorate General of Traffic published the new Framework Programme for the Testing of Automated Vehicles which, among other measures, contemplates "the mandatory delivery of reports, both periodic and final and in the event of incidents, which will allow the DGT to assess the safety of the tests and publish basic information [...] guaranteeing transparency and public trust."

The advancement of digital technology is making it easier for the transport sector to face an unprecedented revolution in autonomous vehicle driving, offering significant improvements in road safety, energy efficiency and mobility accessibility.

The final deployment of these vehicles depends to a large extent on the availability, quality and accessibility of large volumes of data, as well as on an appropriate legal framework that ensures the protection of the various legal assets involved (personal data, trade secrets, confidentiality, etc.), traffic security and transparency. In this context, open data and the reuse of public sector information are essential elements for the responsible development of autonomous mobility, in particular when it comes to ensuring adequate levels of traffic safety.

Data Dependency on Autonomous Vehicles

The technology that supports autonomous vehicles is based on the integration of a complex network of advanced sensors, artificial intelligence systems and real-time processing algorithms, which allows them to identify obstacles, interpret traffic signs, predict the behavior of other road users and, in a collaborative way, plan routes completely autonomously.

In the autonomous vehicle ecosystem, the availability of quality open data is strategic for:

Improve road safety, so that real-time traffic data can be used to anticipate dangers, avoid accidents and optimise safe routes based on massive data analysis.
Optimise operational efficiency, as access to up-to-date information on the state of roads, works, incidents and traffic conditions allows for more efficient planning of journeys.
To promote sectoral innovation, facilitating the creation of new digital tools that facilitate mobility.

Specifically, ensuring the safe and efficient operation of this mobility model requires continuous access to two key categories of data:

Variable or dynamic data, which offers constantly changing information such as the position, speed and behaviour of other vehicles, pedestrians, cyclists or weather conditions in real time.
Static data, which includes relatively permanent information such as the exact location of traffic signs, traffic lights, lanes, speed limits or the main characteristics of the road infrastructure.

The prominence of the data provided by public entities

The sources from which such data come are certainly diverse. This is of great relevance as regards the conditions under which such data will be available. Specifically, some of the data are provided by public entities, while in other cases the origin comes from private companies (vehicle manufacturers, telecommunications service providers, developers of digital tools...) with their own interests or even from people who use public spaces, devices and digital applications.

This diversity requires a different approach to facilitating the availability of data under appropriate conditions, in particular because of the difficulties that may arise from a legal point of view. In relation to Public Administrations, Directive (EU) 2019/1024 on open data and the reuse of public sector information establishes clear obligations that would apply, for example, to the Directorate General of Traffic, the Administrations owning public roads or municipalities in the case of urban environments. Likewise, Regulation (EU) 2022/868 on European data governance reinforces this regulatory framework, in particular with regard to the guarantee of the rights of third parties and, in particular, the protection of personal data.

Moreover, some datasets should be provided under the conditions established for dynamic data, i.e. those "subject to frequent or real-time updates, due in particular to their volatility or rapid obsolescence", which should be available "for re-use immediately after collection, through appropriate APIs and, where appropriate, in the form of a mass discharge."

One might even think that the high-value data category is of particular interest in the context of autonomous vehicles given its potential to facilitate mobility, particularly considering its potential to:

To promote technological innovation, as they would make it easier for manufacturers, developers and operators to access reliable and up-to-date information, essential for the development, validation and continuous improvement of autonomous driving systems.
Facilitate monitoring and evaluation from a security perspective, as transparency and accessibility of such data are essential prerequisites from this perspective.
To boost the development of advanced services, since data on road infrastructure, signage, traffic and even the results of tests carried out in the context of the aforementioned Framework Programme constitute the basis for new mobility applications and services that benefit society as a whole.

However, this condition is not expressly included for traffic-related data in the definition made at European level, so that, at least for the time being, public entities should not be required to disseminate the data that apply to autonomous vehicles under the unique conditions established for high-value data. However, at this time of transition for the deployment of autonomous vehicles, it is essential that public administrations publish and keep updated under appropriate conditions for their automated processing, some datasets, such as those relating to:

Road signs and vertical signage elements.
Traffic light states and traffic control systems.
Lane configuration and characteristics.
Information on works and temporary traffic alterations.
Road infrastructure elements critical for autonomous navigation.

The recent update of the official catalogue of traffic signs, which comes into force on 1 July 2025, incorporates signs adapted to new realities, such as personal mobility. However, it requires greater specificity with regard to the availability of data relating to signals under these conditions. This will require the intervention of the authorities responsible for road signage.

The availability of data in the context of the European Mobility Area

Based on these conditions and the need to have mobility data generated by private companies and individuals, data spaces appear as the optimal legal and governance environment to facilitate their accessibility under appropriate conditions.

In this regard, the initiatives for the deployment of the European Mobility Data Space, created in 2023, constitute an opportunity to integrate into its design and configuration measures that support the need for access to data required by autonomous vehicles. Thus, within the framework of this initiative, it would be possible to unlock the potential of mobility data , and in particular:

Facilitate the availability of data under conditions specific to the needs of autonomous vehicles.
Promote the interconnection of various data sources linked to existing means of transport, but also emerging ones.
Accelerate the digital transformation of autonomous vehicles.
Strengthen the digital sovereignty of the European automotive industry, reducing dependence on large foreign technology corporations.

In short, autonomous vehicles can represent a fundamental transformation in mobility as it has been conceived until now, but their development depends, among other factors, on the availability, quality and accessibility of sufficient and adequate data. The Sustainable Mobility Bill currently being processed in Parliament is a great opportunity to strengthen the role of data in facilitating innovation in this area, which would undoubtedly favour the development of autonomous vehicles. To this end, it will be essential, on the one hand, to have a data sharing environment that makes access to data compatible with the appropriate guarantees for fundamental rights and information security; and, on the other hand, to design a governance model that, as emphasised in the Programme promoted by the Directorate-General for Traffic, facilitates the collaborative participation of "manufacturers, developers, importers and fleet operators established in Spain or the European Union", which poses significant challenges in the availability of data.

Content prepared by Julián Valero, Professor at the University of Murcia and Coordinator of the Research Group "Innovation, Law and Technology" (iDerTec). The contents and points of view reflected in this publication are the sole responsibility of its author.

01/07/2025

New call for awards for open data and data journalism projects in Valencia

Evento

Valencia City Council has launched a call to reward projects that promote the culture of open information and open data in the city. Specifically, it seeks to promote the culture of government transparency and good governance through the reuse of open data.

If you are thinking of participating, here are some of the keys you should take into account (although do not forget to read the complete rules of the call for more information).

What do the prizes consist of?

The awards consist of a single category that encompasses projects that demonstrate the potential of the reuse of public open data , and may also include private data. Specifically, applications, technological solutions, services, works, etc. may be presented. that use public data from the city of Valencia to benefit the community.

The requirements that must be met are the following:

To present an innovative character and highlight its impact on improving the lives of people and their environment.
Be current and be implemented in general, in the territorial area of the municipality of Valencia. The final projects of bachelor's, master's or doctoral theses can have been carried out at any university, but it is mandatory that they refer to and base their research on areas of transparency in the city of Valencia.
Use inclusive and non-sexist language.
Be written in Spanish or Valencian.
Have a single author, which may be a legal entity or association.
Be written in accordance with the terms and conditions of the call, and articles previously published in journals may not participate.
Not have received a subsidy from the Valencia City Council for the same purpose.

Who can participate?

The contest is aimed at audiences from wide sectors: students, entrepreneurs, developers, design professionals, journalists or any citizen with an interest in open data.

Both natural and legal persons from the university field, the private sector, public entities and civil society can participate, provided that they have developed the project in the municipality of Valencia.

What is valued and what do the prizes consist of?

The projects received will be evaluated by a jury that will take into account the following aspects:

Originality and degree of innovation.
Public value and social and urban impact.
Viability and sustainability.
Collaborative nature.

The jury will choose three winning projects, which will receive a diploma and a financial prize consisting of:

First prize: 5,000 euros.
Second prize: 3,000 euros.
Third prize: 2,000 euros.

In addition, the City Council will disseminate and publicize the projects that have been recognized in this call, which will be a loudspeaker to gain visibility and recognition.

The awards will be presented at a public event in person or virtually in the city of Valencia, to which all participants will be invited. An opportunity to engage in conversation with other citizens and professionals interested in the subject.

How can I participate?

The deadline for submitting projects is 7 July 2025. The application can be made in two ways:

In person, presenting the standard form and Annex 1 of the declaration of responsibility.
Digitally through the Electronic Office, where an online application form (which includes the responsible declaration) will be completed.

In both cases, in addition, an explanatory report of the project will have to be presented. This document will contain the description of the project, its objectives, the actions developed and the results obtained, detailed in a maximum of 20 pages. It is also necessary to review the additional documentation indicated in the rules, necessary according to the nature of the participant (natural person, legal entity, associations, etc.).

For those participants who have doubts, the email address sctransparencia@valencia.es has been enabled. You can also ask any questions on the 962081741 and 962085203 phones.

You can see the complete rules at this link.

25/06/2025

Evaluate to trust: the key role of validation and open data in generative AI

Blog

Generative artificial intelligence is beginning to find its way into everyday applications ranging from virtual agents (or teams of virtual agents) that resolve queries when we call a customer service centre to intelligent assistants that automatically draft meeting summaries or report proposals in office environments.

These applications, often governed by foundational language models (LLMs), promise to revolutionise entire industries on the basis of huge productivity gains. However, their adoption brings new challenges because, unlike traditional software, a generative AI model does not follow fixed rules written by humans, but its responses are based on statistical patterns learned from processing large volumes of data. This makes its behaviour less predictable and more difficult to explain, and sometimes leads to unexpected results, errors that are difficult to foresee, or responses that do not always align with the original intentions of the system's creator.

Therefore, the validation of these applications from multiple perspectives such as ethics, security or consistency is essential to ensure confidence in the results of the systems we are creating in this new stage of digital transformation.

What needs to be validated in generative AI-based systems?

Validating generative AI-based systems means rigorously checking that they meet certain quality and accountability criteria before relying on them to solve sensitive tasks.

It is not only about verifying that they ‘work’, but also about making sure that they behave as expected, avoiding biases, protecting users, maintaining their stability over time, and complying with applicable ethical and legal standards. The need for comprehensive validation is a growing consensus among experts, researchers, regulators and industry: deploying AI reliably requires explicit standards, assessments and controls.

We summarize four key dimensions that need to be checked in generative AI-based systems to align their results with human expectations:

Ethics and fairness: a model must respect basic ethical principles and avoid harming individuals or groups. This involves detecting and mitigating biases in their responses so as not to perpetuate stereotypes or discrimination. It also requires filtering toxic or offensive content that could harm users. Equity is assessed by ensuring that the system offers consistent treatment to different demographics, without unduly favouring or excluding anyone.
Security and robustness: here we refer to both user safety (that the system does not generate dangerous recommendations or facilitate illicit activities) and technical robustness against errors and manipulations. A safe model must avoid instructions that lead, for example, to illegal behavior, reliably rejecting those requests. In addition, robustness means that the system can withstand adversarial attacks (such as requests designed to deceive you) and that it operates stably under different conditions.
Consistency and reliability: Generative AI results must be consistent, consistent, and correct. In applications such as medical diagnosis or legal assistance, it is not enough for the answer to sound convincing; it must be true and accurate. For this reason, aspects such as the logical coherence of the answers, their relevance with respect to the question asked and the factual accuracy of the information are validated. Its stability over time is also checked (that in the face of two similar requests equivalent results are offered under the same conditions) and its resilience (that small changes in the input do not cause substantially different outputs).
Transparency and explainability: To trust the decisions of an AI-based system, it is desirable to understand how and why it produces them. Transparency includes providing information about training data, known limitations, and model performance across different tests. Many companies are adopting the practice of publishing "model cards," which summarize how a system was designed and evaluated, including bias metrics, common errors, and recommended use cases. Explainability goes a step further and seeks to ensure that the model offers, when possible, understandable explanations of its results (for example, highlighting which data influenced a certain recommendation). Greater transparency and explainability increase accountability, allowing developers and third parties to audit the behavior of the system.

Open data: transparency and more diverse evidence

Properly validating AI models and systems, particularly in terms of fairness and robustness, requires representative and diverse datasets that reflect the reality of different populations and scenarios.

On the other hand, if only the companies that own a system have data to test it, we have to rely on their own internal evaluations. However, when open datasets and public testing standards exist, the community (universities, regulators, independent developers, etc.) can test the systems autonomously, thus functioning as an independent counterweight that serves the interests of society.

A concrete example was given by Meta (Facebook) when it released its Casual Conversations v2 dataset in 2023. It is an open dataset, obtained with informed consent, that collects videos from people from 7 countries (Brazil, India, Indonesia, Mexico, Vietnam, the Philippines and the USA), with 5,567 participants who provided attributes such as age, gender, language and skin tone.

Meta's objective with the publication was precisely to make it easier for researchers to evaluate the impartiality and robustness of AI systems in vision and voice recognition. By expanding the geographic provenance of the data beyond the U.S., this resource allows you to check if, for example, a facial recognition model works equally well with faces of different ethnicities, or if a voice assistant understands accents from different regions.

The diversity that open data brings also helps to uncover neglected areas in AI assessment. Researchers from Stanford's Human-Centered Artificial Intelligence (HAI) showed in the HELM (Holistic Evaluation of Language Models) project that many language models are not evaluated in minority dialects of English or in underrepresented languages, simply because there are no quality data in the most well-known benchmarks.

The community can identify these gaps and create new test sets to fill them (e.g., an open dataset of FAQs in Swahili to validate the behavior of a multilingual chatbot). In this sense, HELM has incorporated broader evaluations precisely thanks to the availability of open data, making it possible to measure not only the performance of the models in common tasks, but also their behavior in other linguistic, cultural and social contexts. This has contributed to making visible the current limitations of the models and to promoting the development of more inclusive and representative systems of the real world or models more adapted to the specific needs of local contexts, as is the case of the ALIA foundational model, developed in Spain.

In short, open data contributes to democratizing the ability to audit AI systems, preventing the power of validation from residing only in a few. They allow you to reduce costs and barriers as a small development team can test your model with open sets without having to invest great efforts in collecting their own data. This not only fosters innovation, but also ensures that local AI solutions from small businesses are also subject to rigorous validation standards.

The validation of applications based on generative AI is today an unquestionable necessity to ensure that these tools operate in tune with our values and expectations. It is not a trivial process, it requires new methodologies, innovative metrics and, above all, a culture of responsibility around AI. But the benefits are clear, a rigorously validated AI system will be more trustworthy, both for the individual user who, for example, interacts with a chatbot without fear of receiving a toxic response, and for society as a whole who can accept decisions based on these technologies knowing that they have been properly audited. And open data helps to cement this trust by fostering transparency, enriching evidence with diversity, and involving the entire community in the validation of AI systems.

Content prepared by Jose Luis Marín, Senior Consultant in Data, Strategy, Innovation & Digitalization. The contents and views reflected in this publication are the sole responsibility of the author.

20/06/2025

New reports on open data and related technologies: towards an ethical and collaborative ecosystem

Noticia

Open data is a fundamental fuel for contemporary digital innovation, creating information ecosystems that democratise access to knowledge and foster the development of advanced technological solutions.

However, the mere availability of data is not enough. Building robust and sustainable ecosystems requires clear regulatory frameworks, sound ethical principles and management methodologies that ensure both innovation and the protection of fundamental rights. Therefore, the specialised documentation that guides these processes becomes a strategic resource for governments, organisations and companies seeking to participate responsibly in the digital economy.

In this post, we compile recent reports, produced by leading organisations in both the public and private sectors, which offer these key orientations. These documents not only analyse the current challenges of open data ecosystems, but also provide practical tools and concrete frameworks for their effective implementation.

State and evolution of the open data market

Knowing what it looks like and what changes have occurred in the open data ecosystem at European and national level is important to make informed decisions and adapt to the needs of the industry. In this regard, the European Commission publishes, on a regular basis, a Data Markets Report, which is updated regularly. The latest version is dated December 2024, although use cases exemplifying the potential of data in Europe are regularly published (the latest in February 2025).

On the other hand, from a European regulatory perspective, the latest annual report on the implementation of the Digital Markets Act (DMA)takes a comprehensive view of the measures adopted to ensure fairness and competitiveness in the digital sector. This document is interesting to understand how the regulatory framework that directly affects open data ecosystems is taking shape.

At the national level, the ASEDIE sectoral report on the "Data Economy in its infomediary scope" 2025 provides quantitative evidence of the economic value generated by open data ecosystems in Spain.

The importance of open data in AI

It is clear that the intersection between open data and artificial intelligence is a reality that poses complex ethical and regulatory challenges that require collaborative and multi-sectoral responses. In this context, developing frameworks to guide the responsible use of AI becomes a strategic priority, especially when these technologies draw on public and private data ecosystems to generate social and economic value. Here are some reports that address this objective:

Generative IA and Open Data: Guidelines and Best Practices: the U.S. Department of Commerce. The US government has published a guide with principles and best practices on how to apply generative artificial intelligence ethically and effectively in the context of open data. The document provides guidelines for optimising the quality and structure of open data in order to make it useful for these systems, including transparency and governance.
Good Practice Guide for the Use of Ethical Artificial Intelligence: This guide demonstrates a comprehensive approach that combines strong ethical principles with clear and enforceable regulatory precepts.. In addition to the theoretical framework, the guide serves as a practical tool for implementing AI systems responsibly, considering both the potential benefits and the associated risks. Collaboration between public and private actors ensures that recommendations are both technically feasible and socially responsible.
Enhancing Access to and Sharing of Data in the Age of AI: this analysis by the Organisation for Economic Co-operation and Development (OECD) addresses one of the main obstacles to the development of artificial intelligence: limited access to quality data and effective models. Through examples, it identifies specific strategies that governments can implement to significantly improve data access and sharing and certain AI models.
A Blueprint to Unlock New Data Commons for AI: Open Data Policy Lab has produced a practical guide that focuses on the creation and management of data commons specifically designed to enable cases of public interest artificial intelligence use. The guide offers concrete methodologies on how to manage data in a way that facilitates the creation of these data commons, including aspects of governance, technical sustainability and alignment with public interest objectives.
Practical guide to data-driven collaborations: the Data for Children Collaborative initiative has published a step-by-step guide to developing effective data collaborations, with a focus on social impact. It includes real-world examples, governance models and practical tools to foster sustainable partnerships.

In short, these reports define the path towards more mature, ethical and collaborative data systems. From growth figures for the Spanish infomediary sector to European regulatory frameworks to practical guidelines for responsible AI implementation, all these documents share a common vision: the future of open data depends on our ability to build bridges between the public and private sectors, between technological innovation and social responsibility.

29/05/2025

Satellite Data Analysis: A Window on the World from Space - Application in Fisheries Monitoring and Management

Blog

Satellite data has become a fundamental tool for understanding and monitoring our planet from a unique perspective. This data, collected by satellites in orbit around the Earth, provides a global and detailed view of various terrestrial, maritime and atmospheric phenomena that have applications in multiple sectors, such as environmental care or driving innovation in the energy sector.

In this article we will focus on a new sector: the field of fisheries, where satellite data have revolutionised the way fisheries are monitored and managed worldwide. We will review which fisheries satellite data are most commonly used to monitor fishing activity and look at possible uses, highlighting their relevance in detecting illegal activities.

The most popular fisheries-related satellite data: positioning data

Among the satellite data, we find a large amount ofpublic and open data , which are free and available in reusable formats, such as those coming from the European Copernicus programme. This data can be complemented with other data which, although also public, may have costs and restrictions on use or access. This is because obtaining and processing this data involves significant costs and requires purchasing from specialised suppliers such as ORBCOMM, exactEarth, Spire Maritime or Inmarsat. To this second type belong the data from the two most popular systems for obtaining fisheries data, namely:

Automatic Identification System (AIS): transmits the location, speed and direction of vessels. It was created to improve maritime safety and prevent collisions between vessels, i.e. its aim was to prevent accidents by allowing vessels to communicate their position and obtain the location of other ships in real time. However, with the release of satellite data in the 2010s, academia and authorities realised that they could improve situational awareness by providing information about ships, including their identity, course, speed and other navigational data. AIS data went on to facilitate maritime traffic management, enabling coastal authorities and traffic centres to monitor and manage the movement of vessels in their waters. This technology has revolutionised maritime navigation, providing an additional layer of safety and efficiency in maritime operations. Data is available through websites such as MarineTraffic or VesselFinder, which offer basic tracking services for free, but require a subscription for advanced features..

Vessel Monitoring System (VMS): designed specifically for fisheries monitoring, it provides position and movement data. It was created specifically for the monitoring and management of the modern fishing industry. Its development emerged about two decades ago as a response to the need for improved monitoring, control and surveillance of fishing activities. Access to VMS data varies according to jurisdiction and international agreements. The data are mainly used by government agencies, regional fisheries management organisations and surveillance authorities, who have restricted access and must comply with strict security and confidentiality regulations.The data are used mainly by government agencies, regional fisheries management organisations and surveillance authorities, who have restricted access and must comply with strict security and confidentiality regulations.. On the other hand, fishing companies also use VMS systems to manage their fleets and comply with local and international regulations.

Analysis of fisheries satellite data

Satellite data has proven to be particularly useful for fisheries observation, as it can provide both an overview of a marine area or fishing fleet, as well as the possibility of knowing the operational life of a single vessel. The following steps are usually followed:

AIS and VMS data collection.
Integration with other open or private sources. For example: ship registers, oceanographic data, delimitations of special economic zones or territorial waters.
Application of machine learning algorithms to identify behavioural patterns and fishing manoeuvres.
Visualisation of data on interactive maps.
Generation of alerts on suspicious activity (for real-time monitoring).

Use cases of fisheries satellite data

Satellite fisheries data offer cost-effective options, especially for those with limited resources to patrol their waters to continuously monitor large expanses of ocean. Among other activities, these data make possible the development of systems that allow:

Monitoring of compliance with fishing regulations, as satellites can track the position and movements of fishing vessels. This monitoring can be done with historical data, in order to perform an analysis of fishing activity patterns and trends. This supports long-term research and strategic analysis of the fisheries sector.

The detection of illegal fishing, using both historical and real-time data. By analysing unusual movement patterns or the presence of vessels in restricted areas, possible illegal, unreported and unregulated (IUU) fishing activities can be identified. IUU fishing is worth up to US$23.5 billionper year in seafood products.

The assessment of the fishing volume, with data on the carrying capacity of each vessel and the fish transhipments that take place both at sea and in port.

The identification of areas of high fishing activity and the assessment of their impact on sensitive ecosystems.

A concrete example is work by the Overseas Development Institute (ODI), entitled "Turbid Water Fishing", which reveals how satellite data can identify vessels, determine their location, course and speed, and train algorithms, providing unprecedented insight into global fishing activities. The report is based on two sources: interviews with the heads of various private and public platforms dedicated to monitoring IUU fishing, as well as free and open resources such as Global Fishing Watch (GFW) - an organisation that is a collaboration between Oceana, SkyTruth and Google - which provides open data.

Challenges, ethical considerations and constraints in monitoring fishing activity

While these data offer great opportunities, it is important to note that they also have limitations. The study "Fishing for data: The role of private data platforms in addressing illegal, unreported and unregulated fishing and overfishing", mentions the problems of working with satellite data to combat illegal fishing, challenges that can be applied to fisheries monitoring in general:

The lack of a unified universal fishing vessel register. There is a lack of a single database of fishing vessels, which makes it difficult to identify vessels and their owners or operators. Vessel information is scattered across multiple sources such as classification societies, national vessel registers and regional fisheries management organisations.
Deficient algorithms. Algorithms used to identify fishing behaviour are sometimes unable to accurately identify fishing activity, making it difficult to identify illegal activities. For example, inferring the type of fishing gear used, target species or quantity caught from satellite data can be complex.
Most of this data is not free and can be costly. The most commonly used data in this field, i.e. data from AIS and VMS systems, are of considerable cost.
Incomplete satellite data. Automatic Identification Systems (AIS) are mandatory only for vessels over 300 gross tonnes, which leaves out many fishing vessels. In addition, vessels can turn off their AIS transmitters to avoid surveillance.
The use of these tools for surveillance, monitoring and law enforcement carries risks, such as false positives and spurious correlations. In addition, over-reliance on these tools can divert enforcement efforts away from undetectable behaviour.
Collaboration and coordination between various private initiatives, such as Global Fishing Watch, is not as smooth as it could be. If they joined forces, they could create a more powerful data platform, but it is difficult to incentivise such collaboration between competing organisations.

The future of satellite data in fisheries

The field of satellite data is in constant evolution, with new techniques for capture and analysis improving the accuracy and utility of the information obtained. Innovations in geospatial data capture include the use of drones, LiDAR (light detection and ranging) and high-resolution photogrammetry, which complement traditional satellite data. In the field of analytics, machine learning and artificial intelligence are playing a crucial role. For example, Global Fishing Watch uses machine learning algorithms to process millions of daily messages from more than 200,000 fishing vessels, allowing a global, real-time view of their activities.

The future of satellite data is promising, with technological advances offering improvements in the resolution, frequency, volume, quality and types of data that can be collected. The miniaturisation of satellites and the development of microsatellite constellations are improving access to space and the data that can be obtained from it.

In the context of fisheries, satellite data are expected to play an increasingly important role in the sustainable management of marine resources. Combining these data with other sources of information, such as in situ sensors and oceanographic models, will allow a more holistic understanding of marine ecosystems and the human activities that affect them.

Content prepared by Miren Gutiérrez, PhD and researcher at the University of Deusto, expert in data activism, data justice, data literacy and gender disinformation. The contents and views reflected in this publication are the sole responsibility of the author.

30/04/2025

Data on the banking sector: sources of access

Blog

Access to financial and banking data is revolutionising the sector, promoting transparency, financial inclusion and innovation in economic services. However, the management of this data faces regulatory challenges in balancing openness with security and privacy.

For this reason, there are different ways of accessing this type of data, as we will see below.

Open Banking and Open Finance versus Open Data.

These terms, although related, have important differences.

The term Open Banking refers to a system that allows banks and other financial institutions to securely and digitally share customer financial data with third parties. This requires the customers' express approval of the data sharing conditions . This consent can be cancelled at any time according to the customer's wishes.

Open Finance, on the other hand, is an evolution of Open Banking which embraces a broader range of financial products and services. When we talk about Open Finance, in addition to banking data, data on insurance, pensions, investments and other financial services are included.

In both Open Banking and Open Finance, the data is not open (Open Data), but can only be accessed by those previously authorised by the customer. The exchange of data is done through an application programming interface or API , which guarantees the agility and security of the process. All of this is regulated by the European directive on payment services in the internal market (known as PSD2), although the European Commission is working on updating the regulatory framework.

Applications of Open Banking and Open Finance:

The purpose of these activities is to provide access to new services based on information sharing. For example, they facilitate the creation of apps that unify access to all the bank accounts of a customer, even if they are from different providers. This improves the management and control of income and expenditure by providing an overview in a single environment.

Another example of use is that they allow providers to cross-check information more quickly. For example, by allowing access to a customer's financial data, a dealer could provide information on financing options more quickly.

Open data platforms on banking

While private banking data, like all types of personal data, is strictly regulated and cannot be openly published due to privacy protection regulations, there are sets of financial data that can be freely shared. For example, aggregate information on interest rates, economic indicators, historical stock market data, investment trends and macroeconomic statistics, which are accessible through open sources.

This data, in addition to boosting transparency and confidence in markets, can be used to monitor economic trends, prevent fraud and improve risk management globally. In addition, fintechcompanies, developers and entrepreneurs can take advantage of them to create solutions such as financial analysis tools, digital payment systems or automated advice.

Let's look at some examples of places where open data on the banking and financial sector can be obtained.

International sources

Some of the most popular international sources are:

European Central Bank: provides statistics and data on euro area financial markets, through various platforms. Among other information, users can download datasets on inflation, bank interest rates, balance of payments, public finances, etc.
World Bank: provides access to global economic data on financial development, poverty and economic growth.
International Monetary Fund: provides simplified access to macroeconomic and financial data, such as the outlook for the global or regional economy. It also provides open data from reports such as its Fiscal Monitor, which analyses the latest developments in public finances.
Federal Reserve Economic Data (FRED): focuses on US economic data, including market indicators and interest rates. This repository is created and maintained by the Research Department of the Federal Reserve Bank of St. Louis.

National sources

Through the National Open Data Catalogue of datos.gob.es a large number of datasets related to the economy can be accessed. One of the most prominent publishers is the Instituto Nacional de Estadística (INE), which provides data on defaults by financial institution, mortgages, etc.

In addition, the Banco de España offers various products for those interested in the country's economic data:

Statistics: the Banco de España collects, compiles and publishes a wide range of economic and financial statistics. It includes information on interest and exchange rates, financial accounts of the institutional sectors, balances of payments and even household financial surveys, among others.
Dashboard: the Banco de España has also made available to the public an interactive viewer that allows quarterly and annual data on external statistics to be consumed in a more user-friendly way.

In addition, Banco de España has set up asecure room for researchers to access data that is valuable but cannot be opened to the general public due to its nature. In this sense we find:

BELab: the secure data laboratory managed by the Banco de España, offering on-site (Madrid) and remote access. These data have been used in various projects.
ES_DataLab: restricted microdata laboratory for researchers developing projects for scientific and public interest purposes. In this case, it brings together micro-data from various organisations, including the Bank of Spain.

Data spaces: an opportunity for secure and controlled exchange of financial data

As we have just seen, there are also options to facilitate access to financial and banking data in a controlled and secure manner. This is where data spaces come into play, an ecosystem where different actors share data in a voluntary and secure manner, following common governance, regulatory and technical mechanisms.

In this respect, Europe is pushing for a European Financial Data Facility (EEDF), a key initiative within the European Data Strategy. The EEDF consists of three main pillars:

Public reporting data ("public disclosures"): collects financial reporting data (balance sheets, revenues, income statements), which financial firms are required by law to disclose on a regular basis. In this area is the European Single Access Point (ESAP)initiative, a centralised platform for accessing data from over 200 public reports from more than 150,000 companies.

Private customer data of financial service providers: encompasses those data held by financial service providers such as banks. In this area is the framework for access to financial data, which covers data such as investments, insurance, pensions, loans and savings.

Data from supervisory reports: for this type of data, the supervisory strategy, which covers data from different sectors (banks, insurance, pension funds...) has to be taken into account in order to promote digital transformation in the financial sector.

In conclusion, access to financial and banking data is evolving significantly thanks to various initiatives that have enabled greater transparency and that will encourage the development of new services, while ensuring the security and privacy of shared data. The future of the financial sector will be shaped by the ability of institutions and regulators to foster data ecosystems that drive innovation and trust in the market.

21/04/2025

The 13th edition of ASEDIE's Infomediary Sector Report is now available.

Noticia

Data reuse continues to grow in Spain, as confirmed by the last report of the Multisectorial Association of Information (ASEDIE), which analyses and describes the situation of the infomediary sector in the country. The document, now in its 13th edition, was presented last Friday, 4 April, at an event highlighting the rise of the data economy in the current landscape.

The following are the main key points of the report.

An overall profit of 146 million euros in 2023

Since 2013, ASEDIE's Infomediary sector report has been continuously monitoring this sector, made up of companies and organisations that reuse data - generally from the public sector, but also from private sources - to generate value-added products or services. Under the title "Data Economy in its infomediary scope", this year's report underlines the importance of public-private partnerships in driving the data economy and presents relevant data on the current state of the sector.

It should be noted that the financial information used for sales and employees corresponds to the financial year 2023, as financial information for the year 2024 was not yet available at the time of reporting. The main conclusions are:

Since the first edition of the report, the number of infomediaries identified has risen from 444 to 757, an increase of 70%. This growth reflects its dynamism, with annual peaks and troughs, showing a positive evolution that consolidates its recovery after the pandemic, although there is still room for development.
The sector is present in all the country's Autonomous Communities, including the Autonomous City of Melilla. The Community of Madrid leads the ranking with 38% of infomediaries, followed by Catalonia, Andalusia and the Community of Valencia, which represent 15%, 11% and 9%, respectively. The remaining 27% is distributed among the other autonomous communities.
75% of infomediary companies operate in the sub-sectors of geographic information, market, economic and financial studies, and infomediation informatics (focused on the development of technological solutions for the management, analysis, processing and visualisation of data).
The infomediary sector shows a growth and consolidation trend, with 66% of companies operating for less than 20 years. Of this group, 32% are between 11 and 20 years old, while 34% are less than a decade old. Furthermore, the increase in companies between 11 and 40 years old indicates that more companies have managed to sustain themselves over time.
In terms of sales, the estimated volume amounts to 2,646 million euros and the evolution of average sales increases by 10.4%. The average turnover per company is over 4.4 million euros, while the median is 442,000 euros. Compared to the previous year, the average has increased by 200,000 euros, while the median has decreased by 30,000 euros.
It is estimated that the infomediary sector employs some 24,620 people, 64% of whom are concentrated in three sub-sectors. These figures represent a growth of 6% over the previous year. Although the overall average is 39 employees per company, the median per sub-sector is no more than 6, indicating that much of the employment is concentrated in a small number of large companies. The average turnover per employee was 108,000 euros this year, an increase of 8% compared to the previous year.
The subscribed capital of the sector amounts to EUR 252 million. This represents an increase of 6%, which breaks the negative trend of recent years.
74% of the companies have reported profits. The aggregate net profit of the 539 companies for which data is available exceeded 145 million euros.

The following visual summarises some of this data:

• 757 companies identified • 24,620 employees • 2,646 mill € sales • 252 million € capital • 146 million euros net profit Source: Asedie Infomediary Sector Report. "Data Economy in its infomediary scope" (2025).

Figure 1. Source: Asedie Infomediary Sector Report. "Data Economy in its infomediary scope" (2025).

Significant advances in the ASEDIE Top 10

The Asedie Top 10 aims to identify and promote the openness of selected datasets for reuse. This initiative seeks to foster collaboration between the public and private sectors, facilitating access to information that can generate significant economic and social benefits. Its development has taken place in three phases, each focusing on different datasets, the evolution of which has been analysed in this report:.

Phase 1 (2019), which promoted the opening of databases of associations, cooperatives and foundations. Currently, 16 Autonomous Communities allow access to the three databases and 11 already offer NIF data. There is a lack of access to cooperatives in a community.
Phase 2 (2020), focusing on datasets related to energy efficiency certificates, SAT registers and industrial estates. All communities have made energy efficiency data available to citizens, but one is missing in industrial parks and three in SAT registers.
Phase 3 (2023), focusing on datasets of economic agents, education centres, health centres and ERES-ERTES (Expediente de Regulación de Empleo y Expediente de Regulación Temporal de Empleo). Progress has been made compared to last year, but work is ongoing to achieve greater uniformity of information.

New success stories and best practices

The report concludes with a section compiling several success stories of products and services developed with public information and contributing to the growth of our economy, for example:

Energy Efficiency Improvement Calculator: allows to identify the necessary interventions and estimate the associated costs and the impact on the energy efficiency certification (EEC).
GEOPUBLIC: is a tool designed to help Public Administrations better understand their territory. It allows for an analysis of strengths, opportunities and challenges in comparison with other similar regions, provinces or municipalities. Thanks to its ability to segment business and socio-demographic data at different scales, it facilitates the monitoring of the life cycle of enterprises and their influence on the local economy.
New website of the DBK sectoral observatory: improves the search for sectoral information, thanks to the continuous monitoring of some 600 Spanish and Portuguese sectors. Every year it publishes more than 300 in-depth reports and 1,000 sectoral information sheets.
Data assignment and repair service: facilitates the updating of information on the customers of electricity retailers by allowing this information to be enriched with the cadastral reference associated with the supply point. This complies with a requirement of the State Tax Administration Agency (AEAT).

The report also includes good practices of public administrations such as:

The Callejero Digital de Andalucía Unificado (CDAU), which centralises, standardises and keeps the region's geographical and postal data up to date.
The Geoportal of the Madrid City Council, which integrates metadata, OGC map services, a map viewer and a geolocator that respect the INSPIRE and LISIGE directives. It is easy to use for both professionals and citizens thanks to its intuitive and accessible interface.
The Canary Statistics Institute (ISTAC), which has made an innovative technological ecosystem available to society. It features eDatos, an open source infrastructure for statistical data management ensuring transparency and interoperability.
The Spanish National Forest Inventory (IFN) and its web application Download IFN, a basic resource for forest management, research and education. Allows easy filtering of plots for downloading.
The Statistical Interoperability Node, which provides legal, organisational, semantic and technical coverage for the integration of the different information systems of the different levels of administrative management.
The Open Cohesion School, an innovative educational programme of the Generalitat de Catalunya aimed at secondary school students. Students investigate publicly funded projects to analyse their impact, while developing digital skills, critical thinking and civic engagement.
The National Publicity System for Public Subsidies and Grants, which has unveiled a completely redesigned website. It has improved its functionality with API-REST queries and downloads. More information here.

In conclusion, the infomediary sector in Spain consolidifies itself as a key driver for the economy, showing a solid evolution and steady growth. With a record number of companies and a turnover exceeding 2.6 billion euros in 2023, the sector not only generates employment, but also positions itself as a benchmark for innovation. Information as a strategic resource drives a more efficient and connected economic future. Its proper use, always from an ethical perspective, promises to continue to be a source of progress both nationally and internationally.

15/04/2025