Every time an asthmatic person checks the level of suspended particles before going for a run, or when a city council decides to close a playground due to a pollution episode, there is open data behind that decision. Open environmental and climate data – on air quality, water, biodiversity or extreme events – are no longer the exclusive preserve of scientists and institutions and have become a global civic infrastructure: accessible, reusable and, increasingly, generated by citizens themselves.
The question is no longer whether this data exists. They exist and in unprecedented quantities. The question is: who uses them and for what? This article covers this ecosystem of information, from global platforms to local repositories, including citizen projects that have transformed our environment.
The citizen turn: from consumers to data producers
For decades, environmental data came mainly from state agencies, government satellites, and large laboratories. That panorama began to transform when sensors became cheaper, smartphones became massive, and organized communities understood that measuring their environment was also a way to protect themselves and their surroundings. In this way, the information generated by citizens is added to that of public bodies, expanding and enriching the collective understanding of the environment. Some examples are:
- iNaturalist, the citizen science platform for documenting biodiversity, accumulates more than 200 million observations, made by 3.3 million participants worldwide. Its data, integrated into GBIF (Global Biodiversity Information Facility), is used in conservation research, in monitoring the impacts of climate change and in biodiversity policies in dozens of countries.
- IQAir AirVisual is another global air quality network, with more than 30,000 stations and real-time data from more than 100 countries, including maps, 7-day forecasts, and recommendations for vulnerable groups.
- NASA's GLOBE Observer has been allowing anyone to record observations of clouds, temperature, ground cover and mosquito habitats from their mobile phone since 2016 – a critical indicator for detecting pockets of vector-borne diseases aggravated by global warming.
- Meteoclimatic is a collaborative network of automatic weather stations that share data in real time, focused on the Iberian Peninsula and nearby areas.
These projects show that citizens no longer only consume data: they also produce it, validate it and make it available to the public.
Cases that change policies: from citizen data to public decision
One of the most persistent prejudices about citizen science is that its data is too imprecise to have a real impact or even to be considered science. Several recent projects challenge that argument and are pushing for the belief to change.
The European COMPAIR project, funded by the Horizon Europe programme between 2021 and 2024, deployed citizen air quality sensors in five cities: Athens, Berlin, Flanders, Plovdiv and Sofia. Citizen sensors are low-cost environmental measurement devices (air, noise, temperature, water, etc.) that citizens themselves install, maintain and use to generate open data on their immediate environment. These sensors were installed in neighbourhoods and spaces used by Roma communities, the elderly and schoolchildren. This choice aimed to make visible exposures to risk that are usually off the radar (for example, school routes with heavy traffic or insufficiently monitored peripheral neighborhoods) and to provide additional data that would allow administrations to design measures aimed at those who breathe the most polluted air. In Sofia, for example, the publication of pollution maps at school entrances led to a documented increase in the use of public school transport; a citizen data project that changed collective behavior.
Citizen data, moreover, when it respects methodological and legal conditions, can be admissible before the courts and contribute to policy improvement. It is precisely this type of use that is being developed by the Sensing for Justice (SensJus) project: a Marie Curie initiative – a prestigious European research funding programme – that uses networks of citizen sensors as evidence in environmental litigation and out-of-court mediations, with success stories documented in the United States and Italy.
Projects located in closer contexts
Environmental or climate data activism is not a distant phenomenon. Spain has a growing network of initiatives that take environmental measurement to the scale of the neighbourhood, river and roof.
Smart Citizen, created by the Barcelona Fab Lab of the Institute for Advanced Architecture of Catalonia (IAAC), is one of the world's leading projects in citizen sensing: it combines a low-cost sensor kit – air quality, temperature, humidity, noise, light – with an open real-time data platform. With more than 9,000 registered users and more than 1,900 sensors deployed in more than 40 countries, it demonstrates that citizen environmental monitoring can have a global reach based on a local initiative. SensaCitizens, on the other hand, is a Spanish network of low-cost environmental monitors, with LoRaWAN technology, aimed at generating useful data for local public policies on air quality and urban comfort.
In the field of everyday health, Planttes stands out, an app developed by the Autonomous University of Barcelona that allows citizens to map in real time the allergenic plants in their environment and indicate their phenological state. The result is a street-level allergy risk map that complements the official pollen information.
River surveillance, on the other hand, is manifested in the Cantabria Rivers Project, active since 2008, which involves volunteers who adopt river sections of about 500 meters and carry out biannual inspections to monitor the ecological status of the river. With 282 sections inspected and more than 300 documented actions, its data feeds the Administration's decisions on ecosystem conservation.
In the urban area there are also various initiatives. Vitoria-Gasteiz participated in the European CITI-SENSE project – together with eight other cities, such as Barcelona, Oslo and Vienna – with a specific initiative for the participatory design of public spaces, which combined noise, air quality and thermal comfort sensors. In total, the project generated more than 9.4 million observations in the participating cities.
At the citizen and local level, the recently presented SolData Spain (Universidad Autónoma de Madrid) offers an open-access geoportal to analyse the evolution of solar energy in Spain over almost three decades, by cross-referencing satellite irradiation data with historical meteorological records.
The following chart summarizes the projects mentioned so far:

Figure 1. Open data and citizen science for climate and change fights. Source: own elaboration - datos.gob.es
The company is also taking action: climate adaptation with open data
The climate crisis is not only an environmental and social issue, but also an operational risk for businesses. According to the Smart Electric Power Alliance's Utility Transformation Profile report (2023), 62% of the utilities surveyed have developed a public carbon reduction plan (mitigation), but climate adaptation measures remain scarce and rarely quantified. However, we find some examples:
Meteoflow, from Iberdrola, is a weather forecasting platform, recognised by the International Research Centre on Artificial Intelligence (IRCAI) linked to UNESCO, among the best AI projects for sustainability. Its function is to optimise the production of its wind and solar farms by anticipating weather conditions, although it also incorporates alert modules for extreme phenomena that allow it to manage risks. To do this, it uses open-access meteorological information together with its own data, both historical and production in real time.
Another example is dotGIS's Solarmap, which combines open data from cartography (CNIG/INSPIRE), solar radiation (AEMET/Copernicus) and geospatial big data to calculate the profitability per roof of installing solar panels anywhere in Spain.
These are initiatives that have an impact on the operational resilience of companies and a potential benefit for the collective energy system, but whose transformative scope grows when their results transcend the logic of protecting corporate assets to be integrated into shared public infrastructure. In the private sector, data (open or not) can help reduce collective risks—such as avoiding blackouts or anticipating impacts on ecosystems—so its impact is not limited to protecting corporate assets. Its civic potential is multiplied when the results, methodologies and datasets that can be accessed are integrated into data infrastructures shared with other public and social actors.
Where we're headed: three trends redefining security
The landscape of open environmental and climate data is changing, and three trends are on the near horizon.
- From mitigation to adaptation. For years, climate policy focused on reducing emissions. That focus is shifting to adaptation: anticipating risks, reducing vulnerabilities, and protecting communities from irreversible changes. The Basque Country is an example of this turn: the Basque Country Climate Change Strategy 2050 (KLIMA 2050) establishes adaptation as a cross-cutting axis with objectives by sector – health, water, biodiversity, energy – and Vitoria-Gasteiz leads commitments within the framework of the European Mission for Climate Neutral Cities. The leap is also an advance in open data: in addition to measuring and publishing open data on emissions, adaptation strategies are beginning to open and document data on vulnerabilities (heatwaves, floods, public health) and institutional and community response capacities, so that they can be reused by administrations, companies and citizens.
- Citizenship as a civic sentinel. Citizens increasingly act as civic sentinels: they collect, contrast and share environmental data that complement official measurements. When a city's neighborhood detects pollution levels that official stations don't record, or when an indigenous community documents changes in its ecosystem that satellites don't capture, a second layer of information critical to risk management is generated. Projects such as OpenTEK/LICCI of the ICTA-UAB integrate indigenous knowledge from Nepal, Thailand, Vietnam and Latin American countries as scientifically legitimate sources of data on climate variability, and make them available under FAIR and CARE principles, seeking to make them as open and reusable as possible without compromising the sovereignty of the communities that generate them.
- FAIR standards and European data spaces. The European Union promotes the Green Deal Data Space and the AD4GD initiative to integrate open environmental data under FAIR (findable, accessible, interoperable, reusable) standards, which facilitate its combined use by multiple actors. In this framework, the EU's Strategic Foresight Report 2025 identifies the climate transition and security as the two axes that exert the greatest pressure on Europe, and underlines the need for shared data infrastructures based on these principles to respond with resilience.
Open environmental and climate data is not just a technical issue or an aspiration for bureaucratic transparency. They are, increasingly, a condition for communities to be able to anticipate risks, demand responsibilities and make well-informed collective decisions in the face of the greatest challenge. The infrastructure exists. The citizens who use it, too, so we must continue to promote to ensure that its use is universal and equitable.
Appendix: Where is the data? A practical guide to repositories
For anyone who wants to explore, reuse, or combine open environmental and climate data, the ecosystem of repositories is vast and increasingly accessible. Here is a selection organized by scale:
At European level:
- data.europa.eu: European data catalogue, where you can find, among others, data on air, water, biodiversity, climate and energy, with documented use cases
- Copernicus C3S: historical climate data and projections; essential climate variables.
- Copernicus CAMS: real-time air quality and atmospheric composition data.
- ESA CCI Open Data: essential climate variables on glaciers, sea level, greenhouse gases, etc.
- European Environment Agency (EEA): environmental indicators, climate risk maps, biodiversity and water quality.
- INSPIRE : is the central access point for locating, visualizing and accessing geographic information and harmonized spatial data of the member countries of the European Union.
You can find more repositories of interest in this article 10 climate-related public data repositories.
At the state level:
- datos.gob.es: National open data portal with a section on the environment.
- AEMET OpenData: historical climatological series, real-time weather data, REST API, etc.
- MITECO Open Data: data on air, water and emissions quality.
- IDEE Geoportal: geospatial data and national environmental cartography, among others.
Content produced by Miren Gutiérrez, PhD and researcher at the University of Deusto, an expert in data activism, data justice, data literacy and gender-based disinformation. The content and views expressed in this publication are the sole responsibility of the author.
Just a few months after the success of its first award, the Madrid City Council has opened the call for the second edition of the Open Data Reuse Awards. It is an initiative that seeks to recognize and promote innovative projects that use the datasets published on the datos.madrid.es portal. With a total endowment of 15,000 euros, these awards consolidate the municipal commitment to data culture, transparency and the creation of social and economic value from public information.
In this article we tell you some of the keys you must take into account to participate.
Two award categories to consider
The call establishes two categories, each with several prizes:
1) Web services, applications and visualizations: rewards projects that generate services, visualizations or web or mobile applications.
- First prize: €4,000
- Second prize: €3,000
- Third prize: €1,500
- Student prize: €1,500
2) Studies, research and ideas: focuses on research projects, analysis or description of ideas to create services, studies, visualizations, web or mobile applications. This category is also open to university end-of-degree and end-of-master's projects (TFG-TFM).
- First prize: €2,500
- Second prize: €1,500
- Third prize: €1,000
- Projects already awarded, subsidized or contracted by the Madrid City Council.
- Projects that do not use any datasets from the municipal portal.
In both categories, it is necessary that at least one set of data from the municipal portal is used, and can be combined with public or private sources from any territorial area. Projects can be recent or have been completed in the two years prior to the closing of the call.
Awards may be declared void if the minimum quality is not reached. In this case, the remaining amounts will be redistributed proportionally among the rest of the winners.
Requirements to participate
The call is open to natural and legal persons who are the authors of the projects or initiatives. The aim is for any person or entity with an interest in the reuse of data to be able to submit their proposal, regardless of their technical level. Therefore, both professionals and companies, researchers, journalists and developers, as well as amateurs and amateurs interested in data analysis and visualization can participate.
In the case of the student prize, only those individuals enrolled in official courses 2023/24, 2024/25 or 2025/26 may participate.
On the other hand, the following are excluded from all categories:
Process Phases
The municipal portal details the phases of the call, which include:
- Publication of the call. On March 3, the regulatory bases were published in the Official Gazette of the Madrid City Council.
- Submission of nominations. The deadline for submitting applications is from March 4 to May 4 (both included). They can be submitted online or in person, as explained below.
-
Analysis and correction. Until June 3, the review of the documentation submitted will be carried out. If necessary, applicants will be contacted to correct errors.
-
Assessment and deliberation. A jury will evaluate all the admitted projects, according to the criteria established in the rules of the call. Their usefulness, economic value, social value and contribution to transparency will be taken into account; their degree of innovation and creativity; the variety of datasets used from the Madrid Open Data Portal; and its technical quality. This phase will run until September 15.
-
Resolution. In the months of September and October , the proposal for the granting and official publication of the resolution will be carried out.
-
Awards ceremony. The awards will be presented at a public event, estimated for the month of November.
The official website will update dates and documentation as the process progresses.
How applications are submitted
As mentioned above, applications can be submitted electronically or in person:
- Online, through the electronic headquarters of the Madrid City Council. Identification and electronic signature are required for this.
- In person, at the registration assistance offices of the Madrid City Council, as well as at the registries of other public administrations.
Individuals may submit the application in both ways, while legal persons may only submit the application electronically.
In both cases, nominations must include:
- Official application form, to be downloaded from the Madrid City Council's electronic headquarters.
- Project report, based on a model to be downloaded from the aforementioned electronic office. This document will include the title, authorship and a detailed description, as well as the list of datasets used, the objectives, the target audience, the expected impact, the degree of innovation and the technology used.
- Responsible declaration.
- Collaboration agreement, in the case of presenting itself as a group.
Get inspired by the winning projects of the first edition
The second edition of the Open Data Reuse Awards comes on the heels of the success of the previous edition. In 2025, the Madrid City Council held the first edition of these awards, which brought together 65 nominations of great quality and diversity. Among them, proposals promoted by university students, startups, multidisciplinary teams and citizens committed to the intelligent use of public data stood out.
The award-winning projects demonstrated that open data can become real tools to improve urban life, boost transparency and generate useful knowledge for the city. In this article we summarize what these projects consisted of.
In summary, the II Open Data Reuse Awards 2026 are an opportunity to demonstrate how public data can be turned into real innovation. An invitation to develop projects that promote a smarter, more transparent and participatory Madrid.
On Wednesday, March 4, the Cajasiete Big Data, Open Data and Blockchain Chair of the University of La Laguna held a webinar to present the winning ideas of the Cabildo de Tenerife Open Data Contest: Reuse Ideas. An event to highlight the potential of public information when it is put at the service of citizens. The recording of the presentation is available here.
In this post we will review what each of the winning projects consists of – which are still pending ideas for development in apps – and what challenges they would answer.
Cultiva+ Tenerife: precision agriculture for the Tenerife countryside
The first prize-winning project was born from a very specific need that every farmer on the island knows well: to make the right decisions at the right time. Which crop is most profitable this season? What are the weather conditions forecast for the coming weeks? Is there a fair or event in the sector that should not be missed?
Cultiva+ Tenerife is an application designed specifically for the agricultural sector that integrates open data from the Cabildo to answer these questions in a simple and intuitive way.
Specifically, it is aimed at both workers already established in the sector and new farmers. In the first case, the app would facilitate daily work through irrigation recommendations and other issues that improve production; while for new farmers the application would help to select the best plot to start an agricultural activity according to soil type, weather conditions, etc.

Figure 1. Possible uses of the Cultiva+ Tenerife application according to the type of user. Source: presentation by Cultiva+Tenerife in the Webinar "From data to innovation: Reuse ideas awarded in the I Open Data Contest of the Cabildo de Tenerife, Universidad de la Laguna".
The application would intuitively and clearly collect information such as:
- Price information: the farmer can consult the evolution of market prices of different products, which allows him to plan what to grow based on the expected profitability.
- Weather conditions: the app crosses weather data with the specific needs of each type of crop, helping to anticipate irrigation, protection or harvests.
-
Agenda of activities of interest: agricultural fairs, technical conferences, calls for grants... All relevant information for the sector, centralized in one place.

Figure 2. Visual structure of the Cultiva+Tenerife application. Source: presentation by Cultiva+Tenerife in the Webinar "From data to innovation: Reuse ideas awarded in the I Open Data Contest of the Cabildo de Tenerife, Universidad de la Laguna".
Something that was highlighted as valuable about this project in the webinar is its focus on a group that has historically had less access to digital tools: farmers in Tenerife. The proposal does not seek to complicate their day-to-day life with unnecessary technology, but to simplify decisions that today are often made by eye or with incomplete information. Precision agriculture is no longer just a matter for large farms: with open data and a good application, it can be within the reach of any local producer.
Analysis of trends and models on tourism in Tenerife: when the data reveal a crisis
The second winning project addresses one of the most complex and urgent issues in the reality of Tenerife: the relationship between tourism, housing and the labour market. An equation with multiple variables that directly affects the quality of life of residents and that, until now, was difficult to analyse rigorously without access to reliable data.
The starting point of the project is revealing: in June 2024, 35% of the new employment contracts signed in Tenerife corresponded to the hospitality sector. A figure that perfectly illustrates the structural dependence of the island's economy on tourism, but which also opens up uncomfortable questions: to what extent is tourism growth transforming the housing market? Are you displacing habitual residents from certain areas? How will tourist arrivals evolve in the coming years?
This project proposes to answer these questions through an analysis and prediction model built with data science tools. Its developer proposes to use data such as the number of tourists staying in Tenerife according to category and area of establishment, available in datos.tenerife.es, to build models with Python and NumPy that allow identifying trends and projecting future scenarios.
The objectives of the project are ambitious but concrete:
- Analyse the relationship between tourist demand and accommodation supply, identifying which areas of the island suffer the greatest pressure and at what times of the year.
- To develop a predictive model capable of estimating the future arrival of tourists and their impact on the tourist housing sector.
- Contribute to mitigating the housing crisis by providing data and analysis that allow us to understand how tourism is affecting the availability of housing for residents.
- To support business and urban planning, offering companies, investors and administrations an analysis tool that facilitates strategic decision-making.
In short, it is a matter of putting the intelligence of data at the service of one of the most current debates that Tenerife has on the table.
The university as a bridge between data and society
The choice of the Cajasiete Big Data, Open Data and Blockchain Chair of the University of La Laguna as a space to give visibility to the winners is in itself a message: the University has a key role in the construction of the open data ecosystem in Tenerife.
This chair has been working for years on the border between academic research and the practical application of technologies such as big data analysis, blockchain or the reuse of public information. Their involvement in this competition and in the dissemination of its results reinforces the idea that open data is also a valuable resource for training, research and local economic development.
The success of this first call has confirmed that there was a real demand for this type of initiative. So much so that the Cabildo has already launched the II Open Data Contest: APP Development, which gives continuity to the process by taking ideas to the next level: the development of functional applications.
If in the first edition ideas and conceptual proposals were awarded, in this second edition the challenge is to build real solutions, with code, user interface and proven functionalities. The economic endowment is 6,000 euros divided into three prizes.
Projects such as Cultiva+ Tenerife or the Analysis of the impact of tourism on housing show that there are ideas with the potential to become useful and sustainable tools. This second phase is the opportunity to materialize them.
"I'm going to upload a CSV file for you. I want you to analyze it and summarize the most relevant conclusions you can draw from the data". A few years ago, data analysis was the territory of those who knew how to write code and use complex technical environments, and such a request would have required programming or advanced Excel skills. Today, being able to analyse data files in a short time with AI tools gives us great professional autonomy. Asking questions, contrasting preliminary ideas and exploring information first-hand changes our relationship with knowledge, especially because we stop depending on intermediaries to obtain answers. Gaining the ability to analyze data with AI independently speeds up processes, but it can also cause us to become overconfident in conclusions.
Based on the example of a raw data file, we are going to review possibilities, precautions and basic guidelines to explore the information without assuming conclusions too quickly.
The file:
To show an example of data analysis with AI we will use a file from the National Institute of Statistics (INE) that collects information on tourist flows in Europe, specifically on occupancy in rural tourism accommodation. The data file contains information from January 2001 to December 2025. It contains disaggregations by sex, age and autonomous community or city, which allows comparative analyses to be carried out over time. At the time of writing, the last update to this dataset was on January 28, 2026.

Figure 1. Dataset information. Source: National Institute of Statistics (INE).
1. Initial exploration
For this first exploration we are going to use a free version of Claude, the AI-based multitasking chat developed by Anthropic. It is one of the most advanced language models in reasoning and analysis benchmarks, which makes it especially suitable for this exercise, and it is the most widely used option currently by the community to perform tasks that require code.
Let's think that we are facing the data file for the first time. We know in broad strokes what it contains, but we do not know the structure of the information. Our first prompt, therefore, should focus on describing it:
PROMPT: I want to work with a data file on occupancy in rural tourism accommodation. Explain to me what structure the file has: what variables it contains, what each one measures and what possible relationships exist between them. It also points out possible missing values or elements that require clarification.

Figure 2. Initial exploration of the data file with Claude. Source: Claude.
Once Claude has given us the general idea and explanation of the variables, it is good practice to open the file and do a quick check. The objective is to assess that, at a minimum, the number of rows, the number of columns, the names of the variables, the time period and the type of data coincide with what the model has told us.
If we detect any errors at this point, the LLM may not be reading the data correctly. If after trying in another conversation the error persists, it is a sign that there is something in the file that makes it difficult to read automatically. In this case, it is best not to continue with the analysis, as the conclusions will be very apparent, but will be based on misinterpreted data.
2. Anomaly management
Second, if we have discovered anomalies, it is common to document them and decide how to handle them before proceeding with the analysis. We can ask the model to suggest what to do, but the final decisions will be ours. For example:
- Missing values: if there are empty cells, we need to decide whether to fill them with an "average" value from the column or simply delete those rows.
- Duplicates: we have to eliminate repeated rows or rows that do not provide new information.
- Formatting errors or inconsistencies: we must correct these so that the variables are coherent and comparable. For example, dates represented in different formats.
- Outliers: if a number appears that does not make sense or is exaggeratedly different from the rest, we have to decide whether to correct it, ignore it or treat it as it is.

Figure 3. Example of missing values analysis with Claude. Source: Claude.
In the case of our file, for example, we have detected that in Ceuta and Melilla the missing values in the Total variable are structural, there is no rural tourism registered in these cities, so we could exclude them from the analysis.
Before making the decision, a good practice at this point is to ask the LLM for the pros and cons of modifying the data. The answer can give us some clue as to which is the best option, or indicate some inconvenience that we had not taken into account.

Figure 4. Claude's analysis on the possibility of eliminating or not securities. Source: Claude.
If we decide to go ahead and exclude the cities of Ceuta and Melilla from the analysis, Claude can help us make this modification directly on the file. The prompt would be as follows:
PROMPT: Removes all rows corresponding to Ceuta and Melilla from the file, so that the rest of the data remains intact. Also explain the steps you're following so they can review them.

Figura 5. Step by step in the modification of data in Claude. Source: Claude.
At this point, Claude offers to download the modified file again, so a good checking practice would be to manually validate that the operation was done correctly. For example, check the number of rows in one file and another or check some rows at random with the first file to make sure that the data has not been corrupted.
3. First questions and visualizations
If the result so far is satisfactory, we can already start exploring the data to ask ourselves initial questions and look for interesting patterns. The ideal when starting the exploration is to ask big, clear and easy to answer questions with the data, because they give us a first vision.
PROMPT: It works with the file without Ceuta and Melilla from now on. Which have been the five communities with the most rural tourism in the total period?

Figure 6. Claude's response to the five communities with the most rural tourism in the period. Source: Claude.
Finally, we can ask Claude to help us visualize the data. Instead of making the effort to point you to a particular chart type, we give you the freedom to choose the format that best displays the information.
PROMPT: Can you visualize this information on a graph? Choose the most appropriate format to represent the data.

Figure 7. Graph prepared by Cloude to represent the information. Source: Claude.
Here, the screen unfolds: on the left, we can continue with the conversation or download the file, while on the right we can view the graph directly. Claude has generated a very visual and ready-to-use horizontal bar chart. The colors differentiate the communities and the date range and type of data are correctly indicated.
What happens if we ask you to change the color palette of the chart to an inappropriate one? In this case, for example, we are going to ask you for a series of pastel shades that are hardly different.
PROMPT: Can you change the color palette of the chart to this? #E8D1C5, #EDDCD2, #FFF1E6, #F0EFEB, #EEDDD3

.Figure 8. Adjustments made to the graph by Claude to represent the information. Source: Claude.
Faced with the challenge, Claude intelligently adjusts the graphic himself, darkens the background and changes the text on the labels to maintain readability and contrast
All of the above exercise has been done with Claude Sonnet 4.6, which is not Anthropic's highest quality model. Its higher versions, such as Claude Opus 4.6, have greater reasoning capacity, deep understanding and finer results. In addition, there are many other tools for working with AI-based data and visualizations, such as Julius or Quadratic. Although the possibilities are almost endless in them, when we work with data it is still essential to maintain our own methodology and criteria.
Contextualizing the data we are analyzing in real life and connecting it with other knowledge is not a task that can be delegated; We need to have a minimum prior idea of what we want to achieve with the analysis in order to transmit it to the system. This will allow us to ask better questions, properly interpret the results and therefore make a more effective prompting.
Content created by Carmen Torrijos, expert in AI applied to language and communication. The content and views expressed in this publication are the sole responsibility of the author.
Gasofinder is a modern and efficient web application designed to help users find the cheapest gas stations closest to their location in real time. Using official data and an interactive map, the application helps save money on every refuel. Specifically, it offers:
-
Interactive Map:
-
Clear Visualization: uses OpenStreetMap maps with a clean design (Carto Voyager).
-
Smart Markers:
🟢 Green: low prices.
🟡 Yellow: average prices.
🔴 Red: high prices. -
Special Icons: Quickly identifies the cheapest (⭐) and the most expensive (⚠️) gas station within the visible area.
-
Savings and Real-Time Pricing: retrieves updated prices directly from the Ministry for the Ecological Transition.
-
Tank Fill Calculation: enter your tank capacity (default 55L) to see how much it will cost to fill it.
-
Savings Estimation: shows how much you save compared to the most expensive option in the area.
-
Price Thermometer: a visual bar that indicates whether a station’s price is good, average, or bad compared to local minimums and maximums.
-
Customization and Filters
Fuel Type: filter by Diesel A, Gasoline 95 E5, Gasoline 98 E5, or Premium Diesel. -
Tank Size: adjustable for personalized calculations.
-
Charging Points Link: direct access to the official electric vehicle charging points map.
-
Navigation and Location
Geolocation: automatically detects your location (first approximately, then precisely). -
Routes: automatically calculates the distance and travel time to the selected gas station.
-
GPS Integration: opens the location directly in Google Maps, Waze, or Apple Maps with a single click.
-
Lock Mode: allows you to "pin" a selected gas station so it does not change automatically while moving the map.
Icon Clarification
Location (📍) Shows your current position.
Star (⭐) The most affordable visible option.
Alert (⚠️) The most expensive visible option.
Centraldecomunicacion.es is a Spanish platform specializing in company databases and company listings ready for use in B2B commercial prospecting, market analysis, segmentation by sector/province, and growth campaigns. They work with public signals and verifiable digital presence to build actionable business records: location, activity, online reputation, and contact channels when available. The data is delivered in compatible formats (Excel/CSV) for integration into standard workflows (CRM, email marketing, BI/Excel, automations).
The databases include fields such as: name, category/sector, description, full address, city/province, coordinates, Google Maps URL, as well as phone number, corporate email, website, and social media when publicly available. Reputation signals such as ratings, number of reviews, and in some cases, review text and hours of operation are also included.
A key differentiator is the focus on quality: verification methodologies and tools (including email verification) are published to reduce bounce rates and improve the actual usefulness of the data. In addition, open resources such as a list of postal codes and utilities related to categorization/segmentation are offered.
Introduction
Every year there are tens of thousands of accidents in Spain, in which thousands of people are injured of varying degrees, and which occur in very different circumstances, both in terms of the type of road and the type of accident.
Many of the statistics related to these parameters are collected in the databases of the Directorate General of Traffic (DGT) and some of them in the catalogue hosted in datos.gob.es.
In this exercise, you will examine the content of the DGT accident database for the year 2024 in order to make a series of basic visualizations that allow us to quickly and intuitively see which are the facts to highlight regarding the incidence of accidents and their consequences in that year.
To do this, we are going to develop Python code that allows us to read and calculate basic metrics regarding the total number of victims, the particularities of the infrastructures as well as the different cases of accidents. And once we have this data available, we will visualize it using the Javascript D3.js library, which allows us both to represent data in its most traditional form and in more contemporary designs, common in the press, thus favoring a narrative that is fluid in style and coherent in content.
In the Python environment we will use commonly and frequently used libraries such as Numpy, for basic calculation - sums, maximums and minimums, and Pandas, to structure the data intuitively, facilitating both its organization and its transformation. We will also work with Datetime, both for the formatting of the input data in standard date types within the world of Python programming, and to add the data in an easy and intuitive way. In this way we will learn how to open any type of data file in . CSV, to structure it in an orderly way and to carry out basic transformations and operations in a simple way.
In the Javascript environment we will develop notebooks in D3.js thanks to the use of Observable, an open and free initiative, to be able to execute Javascript code directly in a web interface, and without having to resort to local servers or complex installations. In different notebooks we will create classic visualizations -such as time series on
Cartesian axes or maps- along with other proposals such as bubble distributions or elements stacked by categories.
In Figure 1 you can see the main stages of this exercise, from the reading of the data within the DGT file, to the operations and output variables in JSON format, which will in turn serve us in a Javascript environment to be able to develop the visualizations in D3.js.

Figure 1. Steps to be followed when performing this exercise, from reading the input CSV file, postprocessing the data with Python, creating an output in JSON format and ultimately displaying the information in D3.js
Access to the Github repositories, GoogleColab notebook and Observable notebooks is done via:
Access to the Github repository
Access to GoogleColab notebook
Access to Observable notebooks
Development Process
1. Reading the data file
The first step will be to read the DGT file containing all the accident records for the year 2024. This step will allow us to identify the fields of interest and especially in what format they are. We will be able to identify if any transformation is required, especially in the information of the date, as it is structured in the original file.
We will also see how to translate the codes of many of the categories offered by the DGT, so that we can make a real interpretation beyond the numbers of categories such as type of accident, type of road or ownership of the road.
Once we understand the structure and content of the data, we can start operating with it.
2. Calculating Metrics
The Pandas Python library allows us to operate with the different columns of data and perform basic calculations that will be representative enough to minimally understand the casuistry of accidents on Spanish roads.
In this section, three types of calculations will be made.
- The first of these will be the calculation of the total number of victims per hour of the day for each of the days of the week. The DGT database is structured by day of the week, so we will also use this time scale to represent the data in a series. It should be noted that avictim is considered to be any person who has died or who is diagnosed as seriously or lightly injured.
- The second calculation will be the sum of the total of accidents for different categories, such as road ownership, type of accident or type of road. This will allow us to see which are the conditions in which accidents are most frequent.
- The third calculation will be the number of accidents per municipality. In this case we will carry out the calculation restricted to the province of Valencia as an example, and which would be applicable to any province or municipality of our interest. In this case we will observe the differences between urban and non-urban centers, as well as those municipalities through which the main communication routes pass.
3. Visualization Design
Once we have calculated the metrics of interest, we will develop four visualization exercises in D3.js. To do this, we will export the result of the metrics in JSON format and create notebooks in Observable. Specifically, we made the following visualizations:
- Time series with the total number of casualties in each hour and day of the week, with an interactive drop-down menu to select the day of the week of interest. In addition to the curve that describes the number of victims, we will draw the uncertainty of all the days of the week on the background of the graph, so that the daily time series is framed in the context of the whole week as a reference.
- Map of the province of Valencia with the total number of accidents by municipality.
- Bubble diagram, with the different magnitudes of the different types of accidents with the total number of accidents in each case written in detail.
- Stacked dot diagram, where we accumulate circles or any other geometric shape for the different road ownership and its total number of accidents within the framework of each ownership.
- Mountain ridge diagram, where the height of each mountain represents the total number of victims on a logarithmic scale.
Viewing metrics
The result of this exercise can be seen graphically and explicitly in the form of visualizations made for the web format and accessible from a web interface, both for its development and for its subsequent publication. These visualizations are gathered as Observable notebooks here:
Access to Observable notebooks
In Figure 2 we have the result of the time series of the total number of victims with respect to the time of day for different days of the week. The time series is framed within the uncertainty of the total number of days of the week, to give an idea of the margin of variability that we can have depending on the time of day.

Figure 2. Time series of total accident casualties by time of day for all days of the week in 2024. The light blue background indicates the uncertainty associated with all the days of the week as context, with a drop-down menu to select the day of the week.
In Figure 3 we can see the map of the province of Valencia with a colour intensity proportional to the number of accidents in each municipality. Those municipalities in which no accidents have been recorded appear in white. Intuitively you can guess the layout of the main roads that cross the province, both the road to the east of the city of Valencia in the direction of Madrid and the inland road to the south of the city in the direction of Alicante

Figure 3. Map of the number of accidents by municipality in the province of Valencia in 2024.
In Figure 4 we see a geometric shape, the circle, associated with the types of accidents, with the detail of the number of accidents associated with each category. In this type of visualization, the most frequent accidents around the center of the diagram naturally emerge, while those that are minority or residual occupy the perimeter of the diagram to also give a round shape to the set of shapes

Figure 4. Bubble diagram of the number of accidents by accident type in 2024.
Figure 5 shows the traditional bar diagram, but this time broken down into smaller units, to refine the number of accidents associated with the ownership of the road where they have occurred. This type of diagram allows us to discern small differences between similar quantities, preserving the general message that we obtain from a calculation of these characteristics.

Figure 5. Bar diagram with dot discretization for the number of accidents by road ownership in 2024
Figure 6 shows the total number of victims on a logarithmic scale based on the height of each mountain for each type of road.

Figure 6. Mountain ridge diagram, displaying the total number of victims by each type of road in 2024.
Lessons learned
Through these steps we will learn a whole series of transversal skills that allow us to work with those datasets that are presented to us in CSV format in columns, a very popular format for which we can perform both their analysis and their visualization. These lessons are specifically:
- Universality of reading and structuring data: the use of tools such as Python, with its Numpy and Pandas libraries, allows access to data in detail and structured in an orderly and intuitive way with a few lines of code.
- Simple calculations in Pandas: the Python library itself allows simple but essential calculations for the preliminary interpretation of results.
- Datetime format: through this Python library we can become familiar with the standard date format, and thus perform all kinds of transformations, filters and selections that interest us the most in any time interval.
- JSON format: once we decide to give space to our visualizations on the web, learning the structure and use of the JSON format is very useful given its wide use in all types of applications and web architectures.
- Spectrum of D3.js possibilities: this Javascript library allows us to explore from the most traditional and conservative to the most creative thanks to its principles based on the most basic shapes, without templates, templates or predefined diagrams.
Conclusions and next steps
We have learned to read and structure data according to the standards of the most widely used formats in the world of analysis and visualization. This exercise also serves as an introductory module to the world of D3.js, a very versatile, current and popular tool within the world of storytelling and data visualization at all levels.
In order to move forward in this exercise, it is recommended:
- For analysts and developers, it is possible to dispense with the Pandas library and structure the data with more elementary Python objects such as arrays and matrices, looking for which functions and which operators allow the same tasks that Pandas does to be performed but in a more fundamental way, especially if we think of production environments for which we need the fewest possible libraries to lighten the application.
- For the creators of visualizations, information on municipalities can also be projected onto existing cartographic databases such as OpenStreetMap and thus link the incidence of accidents to orographic features or infrastructures already reflected in these cartographic databases. For the magnitudes of the accident numbers, you can explore Treemap diagrams or Voronoi diagrams and see if they convey the same message as the ones presented in this exercise.
Areas of application
Los pasos descritos en este ejercicio pueden pasar a formar parte de cualquier caja de herramientas de uso habitual para los siguientes perfiles:
- Data analysts: here are the basic steps for the description of a data file in CSV format and the basic calculations to be carried out both in the date field and operations between variables of different columns. These tools can be used to introduce you to the world of data analysis and help in those first steps when facing a dataset.
- Scientists and research staff: the universality of the tools described here apply to a wide variety of data sources, such as that experienced in experimental sciences and observations or measurements of all kinds. These tools allow for a quick and rigorous analysis regardless of the field of knowledge in which you work.
- Web developers: the export of data in JSON format as well as the Javascript code offered in Observable notebooks are easily integrated into all types of environments (Svelte, React, Angular, Vue) and allow the creation of visualizations on a website in a simple and intuitive way.
- Journalists: covering the entire life process of a data file, from its reading to its visualization, gives the journalist or researcher independence when it comes to evaluating and interpreting the data by himself without depending on external technical resources. The creation of the map by municipalities opens the door to using any other similar data, such as electoral processes, with the same output format to show geographical variability with respect to any type of magnitude.
- Graphic Designers: Handling visualization tools with a wide degree of freedom allows designers to cultivate all their creativity within the rigor and accuracy that data requires.
Web viewer that displays the fiber deployments of all PEBA and UNICO programs on a single map, based on publicly available data. Each area has the background color of the awarded operator, and the border is a different color for each program. In the case of the 2013-2019 PEBA plan, as deployments are assigned to individual population entities, a marker is shown with the location obtained from the CNIG. In addition, when the map is not zoomed in, a heat map is displayed showing the distribution of deployments by area.
This visualization avoids having to compare different map viewers if what we are interested in is seeing which operators reach which areas or simply having an overview of which deployments are pending in my area. It also allows us to consult aspects such as the updated completion date, which were previously only available in the different Excel files for each program. I also think it could be useful for analyzing how the areas are distributed among the different programs (for example, if an area covered in UNICO 2021 then has nearby areas in UNICO 2022 covered by another operator, for example), or even possible overlaps (for example, due to areas that were not executed in previous programs).
IA agents (such as Google ADK, Langchain and so) are so-called "brains". But these brains without "hands" cannot operate on the real world performing API requests or database queries. These "hands" are the tools.
The challenge is the following: how do you connect brain with hands in an standard, decoupled and scalable fashion? The answers is the Model Context Protocol (MCP).
As a practical exercise, we built a conversational agent system that explores the Open Data national repository hosted at datos.gob.es through natural language questions, smoothing in this way the access to open data.
In this practical exercise, the main objective is to illustrate, step by step, how to build an independent tools server that interacts with the MCP protocol.
To make this exercise tangible and not just theoretical, we will use FastMCP to build the server. To prove that our server works, we will create a simple agent with Google ADK that uses it. The use case (querying the datos.gob.es API) illustrates this connection between tools and agents. The real learning lies in the architecture, which you could reuse for any API or database.
Below are the technologies we will use and a diagram showing how the different components are related to each other.
-
FastMCP (mcp.server.fastmcp): a lightweight implementation of the MCP protocol that allows you to create tool servers with very little code using Python decorators. It is the “main character” of the exercise.
-
Google ADK (Agent Development Kit): a framework to define the AI agent, its prompt, and connect it to the tools. It is the “client” that tests our server.
-
FastAPI: used to serve the agent as a REST API with an interactive web interface.
-
httpx: used to make asynchronous calls to the external datos.gob.es API.
-
Docker and Docker Compose: used to package and orchestrate the two microservices, allowing them to run and communicate in isolation.

Figure 1. Decoupled architecture with MCP comunication.
Figure 1 illustrates a decoupled architecture divided into four main components that communicate via the MCP protocol. When the user makes a natural language query, the ADK Agent (based on Google Gemini) processes the intent and communicates with the MCP server through the MCP Protocol, which acts as a standardized intermediary. The MCP server exposes four specialized tools (search datasets, list topics, search by topic, and get details) that encapsulate all the business logic for interacting with the external datos.gob.es API. Once the tools execute the required queries and receive the data from the national catalog, the result is propagated back to the agent, which finally generates a user-friendly response, thus completing the communication cycle between the “brain” (agent) and the “hands” (tools).
Access the data lab repository on GitHub
Run the data pre-processing code on Google Colab
The architecture: MCP server and consumer agent
The key to this exercise is understanding the client–server relationship:
- The Server (Backend): it is the protagonist of this exercise. Its only job is to define the business logic (the “tools”) and expose them to the outside world using the standard MCP “contract.” It is responsible for encapsulating all the logic for communicating with the datos.gob.es API.
- The Agent (Frontend): it is the “client” or “consumer” of our server. Its role in this exercise is to prove that our MCP server works. We use it to connect, discover the tools that the server offers, and call them.
- The MCP Protocol: it is the “language” or “contract” that allows the agent and the server to understand each other without needing to know the internal details of the other.
Development process
The core of the exercise is divided into three parts: creating the server, creating a client to test it, and running them.
1. The tool server (the backend with MCP)
This is where the business logic lives and the main focus of this tutorial. In the main file (server.py), we define simple Python functions and use the FastMCP @mcp.tool decorator to expose them as consumable “tools.”
The description we add to the decorator is crucial, since it is the documentation that any MCP client (including our ADK agent) will read to know when and how to use each tool.
The tools we will define in this exercise are:
- buscar_datasets(titulo: str): to search for datasets by keywords in the title.
- listar_tematicas(): to discover which data categories exist.
- buscar_por_tematica(tematica_id: str): to find datasets for a specific topic.
- obtener_detalle_dataset(dataset_id: str): to retrieve the complete information for a dataset.
2. The consumer agent (the frontend with Google ADK)
Once our MCP server is built, we need a way to test it. This is where Google ADK comes in. We use it to create a simple “consumer agent.”
The magic of the connection happens in the tools argument. Instead of defining the tools locally, we simply pass it the URL of our MCP server. When the agent starts, it will query that URL, read the MCP “contract,” and automatically know which tools are available and how to use them.
# Ejemplo de configuración en agent.py
root_agent = LlmAgent(
...
instruction="Eres un asistente especializado en datos.gob.es...",
tools=[
MCPToolset(
connection_params=StreamableHTTPConnectionParams(
url="http://mcp-server:8000/mcp",
),
)
]
)3. Orchestration with Docker Compose
Finally, to run our MCP Server and the consumer agent together, we use docker-compose.yml. Docker Compose takes care of building the images for each service, creating a private network so they can communicate (which is why the agent can call http://mcp-server:8000), and exposing the necessary ports.
Testing the MCP server in action
Once we run docker-compose up --build, we can access the agent’s web interface at http://localhost:8080.
The goal of this test is not only to see whether the bot responds correctly, but to verify that our MCP server works properly and that the ADK agent (our test client) can discover and use the tools it exposes.

Figure 2. Screenshot of the agent showing its tools.
The true power of decoupling becomes evident when the agent logically chains together the tools provided by our server.

Figure 3. Screenshot of the agent showing the joint use of tools.
What can we learn?
The goal of this exercise is to learn the fundamentals of a modern agent architecture, focusing on the tool server. Specifically:
- How to build an MCP server: how to create a tool server from scratch that speaks MCP, using decorators such as
@mcp.tool. - The decoupled architecture pattern: the fundamental pattern of separating the “brain” (LLM) from the “tools” (business logic).
- Dynamic tool discovery: how an agent (in this case, an ADK agent) can dynamically connect to an MCP server to discover and use tools.
- External API integration: the process of “wrapping” a complex API (such as datos.gob.es) in simple functions within a tool server.
- Orchestration with Docker: how to manage a microservices project for development.
Conclusions and future work
We have built a robust and functional MCP tool server. The real value of this exercise lies in the how: a scalable architecture centered around a tool server that speaks a standard protocol.
This MCP-based architecture is incredibly flexible. The datos.gob.es use case is just one example. We could easily:
- Change the use case: replace
server.pywith one that connects to an internal database or the Spotify API, and any agent that speaks MCP (not just ADK) could use it. - Change the “brain”: swap the ADK agent for a LangChain agent or any other MCP client, and our tool server would continue to work unchanged.
For those interested in taking this work to the next level, the possibilities focus on improving the MCP server:
- Implement more tools: add filters by format, publisher, or date to the MCP server.
- Integrate caching: use Redis in the MCP server to cache API responses and improve speed.
- Add persistence: store chat history in a database (this would be on the agent side).
Beyond these technical improvements, this architecture opens the door to many applications across very different contexts.
- Journalists and academics can have research assistants that help them discover relevant datasets in seconds.
- Transparency organizations can build monitoring tools that automatically detect new publications of public procurement or budget data.
- Consulting firms and business intelligence teams can develop systems that cross-reference information from multiple government sources to produce sector reports.
- Even in education, this architecture serves as a didactic foundation for teaching advanced concepts such as asynchronous programming, API integration, and AI agent design.
The pattern we have built—a decoupled tool server that speaks a standard protocol—is the foundation on which you can develop solutions tailored to your specific needs, regardless of the domain or data source you are working with.
The Cabildo Insular de Tenerife has announced the II Open Data Contest: Development of APPs, an initiative that rewards the creation of web and mobile applications that take advantage of the datasets available on its datos.tenerife.es portal. This call represents a new opportunity for developers, entrepreneurs and innovative entities that want to transform public information into digital solutions of value for society. In this post, we tell you the details about the competition.
A growing ecosystem: from ideas to applications
This initiative is part of the Cabildo de Tenerife's Open Data project, which promotes transparency, citizen participation and the generation of economic and social value through the reuse of public information.
The Cabildo has designed a strategy in two phases:
-
The I Open Data Contest: Reuse Ideas (already held) focused on identifying creative proposals.
-
The II Contest: Development of PPPs (current call) that gives continuity to the process and seeks to materialize ideas in functional applications.
This progressive approach makes it possible to build an innovation ecosystem that accompanies participants from conceptualization to the complete development of digital solutions.
The objective is to promote the creation of digital products and services that generate social and economic impact, while identifying new opportunities for innovation and entrepreneurship in the field of open data.
Awards and financial endowment
This contest has a total endowment of 6,000 euros distributed in three prizes:
-
First prize: 3,000 euros
-
Second prize: 2,000 euros
-
Third prize: 1,000 euros
Who can participate?
The call is open to:
-
Natural persons: individual developers, designers, students, or anyone interested in the reuse of open data.
-
Legal entities: startups, technology companies, cooperatives, associations or other entities.
As long as they present the development of an application based on open data from the Cabildo de Tenerife. The same person, natural or legal, can submit as many applications as they wish, both individually and jointly.
What kind of applications can be submitted?
Proposals must be web or mobile applications that use at least one dataset from the datos.tenerife.es portal. Some ideas that can serve as inspiration are:
-
Applications to optimize transport and mobility on the island.
-
Tools for visualising tourism or environmental data.
-
Real-time citizen information services.
-
Solutions to improve accessibility and social participation.
-
Economic or demographic data analysis platforms.
Evaluation criteria: what does the jury assess?
The jury will evaluate the proposals considering the following criteria:
-
Use of open data: degree of exploitation and integration of the datasets available in the portal.
-
Impact and usefulness: value that the application brings to society, ability to solve real problems or improve existing services.
-
Innovation and creativity: originality of the proposal and innovative nature of the proposed solution.
-
Technical quality: code robustness, good programming practices, scalability and maintainability of the application.
-
Design and usability: user experience (UX), attractive and intuitive visual design, guarantee of digital accessibility on Android and iOS devices.
How to participate: deadlines and form of submission:
Applications can be submitted until March 10, 2026, three months from the publication of the call in the Official Gazette of the Province.
Regarding the required documentation, proposals must be submitted in digital format and include:
-
Detailed technical description of the application.
-
Report justifying the use of open data.
-
Specification of technological environments used.
-
Video demonstration of how the application works.
-
Complete source code.
-
Technical summary sheet.
The organising institution recommends electronic submission through the Electronic Office of the Cabildo de Tenerife, although it is also possible to submit it in person at the official registers enabled. The complete bases and the official application form are available at the Cabildo's Electronic Office.
With this second call, the Cabildo de Tenerife consolidates its commitment to transparency, the reuse of public information and the creation of a digital innovation ecosystem. Initiatives like this demonstrate how open data can become a catalyst for entrepreneurship, citizen participation, and local economic development.