Blog

Cities account for more than two-thirds of Europe's population and consume around 80% of energy. In this context, climate change is having a particularly severe impact on urban environments, not only because of their density, but also because of their construction characteristics, their energy metabolism and the scarcity of vegetation in many consolidated areas. One of the most visible and worrying effects is the phenomenon known as urban heat island (UHI).

Heat islands occur when the temperature in urban areas is significantly higher than in nearby rural or peri-urban areas, especially at night. This thermal differential can easily exceed five degrees Celsius under certain conditions. The consequences of this phenomenon go beyond thermal discomfort: it directly affects health, air quality, energy consumption, urban biodiversity and social equity.

In recent years, the availability of open data—especially geospatial data—has made it possible to characterize, map, and analyze urban heat islands with unprecedented accuracy. This article explores how this data can be used to design urban solutions adapted to climate change, with heat island mitigation as its focus.

What are urban heat islands and why do they occur?

Figure 1. Illustrative element on heat islands.

 

To intervene effectively in heat islands, it is necessary to know where, when and how they occur. Unlike other natural hazards, the heat island effect is not visible to the naked eye, and its intensity varies depending on the time of day, time of year, and specific weather conditions. It therefore requires a solid and dynamic knowledge base, which can only be built through the integration of diverse, up-to-date and territorialized data.

At this point, open geospatial data is a critical tool. Through satellite images, urban maps, meteorological data, cadastral cartography and other publicly accessible sets, it is possible to build urban thermal models, identify critical areas, estimate differential exposures and evaluate the impact of the measures adopted.

The main categories of data that allow us to address the phenomenon of heat islands from a territorial and interdisciplinary perspective are detailed below.

Types of geoespatial data applicable to the study of the phenomenon

1. Earth observation satellite data

Thermal sensors on satellites  such as Landsat 8/9 (NASA/USGS) or Sentinel-3 (Copernicus) make it possible to generate urban surface temperature maps with resolutions ranging from 30 to 1,000 metres. Although these images have spatial and temporal limitations, they are sufficient to detect patterns and trends, especially if combined with time series.

This data, accessible through platforms such as the Copernicus Open Access Hub or the USGS EarthExplorer, is essential for comparative studies between cities or for observing the temporal evolution of the same area.

2. Urban weather data

The network of AEMET stations, together with other automatic stations managed by autonomous communities or city councils, allows the evolution of air temperatures in different urban points to be analysed. In some cases, there are also citizen sensors or networks of sensors distributed in the urban space that allow real-time heat maps to be generated with high resolution.

3. Urban mapping and digital terrain models

Digital surface models (DSM), digital terrain models (DTM) and mappings derived from LIDAR allow the study of urban morphology, building density, street orientation, terrain slope and other factors that affect natural ventilation and heat accumulation. In Spain, this data is accessible through the National Center for Geographic Information (CNIG).

4. Land cover and land use databases

Databases such as Corine Land Cover of the Copernicus Programme, or land use maps at the regional level make it possible to distinguish between urbanised areas, green areas, impermeable surfaces and bodies of water. This information is key to calculating the degree of artificialization of an area and its relationship with the heat balance.

5. Inventories of urban trees and green spaces

Some municipalities publish on their open data portals the detailed inventory of urban trees, parks and gardens. These georeferenced data make it possible to analyse the effect of vegetation on thermal comfort, as well as to plan new plantations or green corridors.

6. Socioeconomic and vulnerability data

Data from the National Institute of Statistics (INE), together with the social information systems of autonomous communities and city councils, make it possible to identify the most vulnerable neighbourhoods from a social and economic point of view. Its cross-referencing with thermal data allows a climate justice dimension to be incorporated into decision-making.

Practical applications: how open data is used to act

Once the relevant data has been gathered and integrated, multiple analysis strategies can be applied to support public policies and urban projects with sustainability and equity criteria. Some of the main applications are described below.

·        Heat zone mapping and vulnerability maps: Using thermal imagery, weather data, and urban layers together, heat island intensity maps can be generated at the neighborhood or block level. If these maps are combined with social, demographic and public health indicators, it is possible to build thermal vulnerability maps, which prioritize intervention in areas where high temperatures and high levels of social risk intersect. These maps allow, for example:

·        Identify priority neighborhoods for urban greening.

·        Plan evacuation routes or shaded areas during heat waves.

·        Determine the optimal location of climate refuges.

·        Assessing the impact of nature-based solutions: Open data also makes it possible to monitor the effects of certain urban actions. For example, using time series of satellite images or temperature sensors, it is possible to assess how the creation of a park or the planting of trees on a street has modified the surface temperature. This ex-post evaluation approach  allows justifying public investments, adjusting designs and scaling effective solutions to other areas with similar conditions.

·        Urban modelling and climate simulations: three-dimensional urban models, built from open LIDAR data or cadastral mapping, make it possible to simulate the thermal behaviour of a neighbourhood or city under different climatic and urban scenarios. These simulations, combined with tools such as ENVI-met or Urban Weather Generator, are essential to support decision-making in urban planning.

Existing studies and analysis on urban heat islands: what has been done and what we can learn

During the last decade, multiple studies have been carried out in Spain and Europe that show how open data, especially geospatial data, allow the phenomenon of urban heat islands to be characterised and analysed. These works are fundamental not only because of their specific results, but also because they illustrate replicable and scalable methodologies. Some of the most relevant are described below.

Polytechnic University of Madrid study on surface temperature in Madrid

A team from the Department of Topographic Engineering and Cartography of the UPM analysed the evolution of surface temperature in the municipality of Madrid using thermal images from the Landsat 8 satellite in the summer period. The study focused on detecting spatial changes in warmer areas and relating them to land use, urban vegetation and building density.

Figure 2. Illustrative image. Source: generated with AI

Methodology:

Remote sensing techniques were applied to extract the surface temperature from the TIRS thermal channel of the Landsat. Subsequently, a statistical analysis of correlation between thermal values and variables such as NDVI (vegetation index), type of land cover (CORINE data) and urban morphology was carried out.

Main results:

Areas with higher building density, such as the central and southern neighborhoods, showed higher surface temperatures. The presence of urban parks reduced the temperature of their immediate surroundings by 3 to 5 °C. It was confirmed that the heat island effect intensifies at night, especially during persistent heat waves.

This type of analysis is especially useful for designing urban greening strategies and for justifying interventions in vulnerable neighbourhoods.

Barcelona Climate Vulnerability Atlas

Barcelona City Council, in collaboration with experts in public health and urban geography, developed a Climate Vulnerability Atlas which includes detailed maps of heat exposure, population sensitivity, and adaptive capacity. The objective was to guide municipal policies against climate change, especially in the field of health and social services.

Figure 3. Image containing fence, exterior, buildings and grass. Source: generated with AI

Methodology:

The atlas was developed by combining open and administrative data at the census tract level. Three dimensions were analysed: exposure (air temperature and surface data), sensitivity (advanced age, density, morbidity) and adaptive capacity (access to green areas, quality of housing, facilities). The indicators were normalized and combined through multi-criteria spatial analysis to generate a climate vulnerability index. The result made it possible to locate the neighbourhoods most at risk from extreme heat and to guide municipal measures.

Main results:

Based on the atlas, the network of "climate shelters" was designed, which includes libraries, civic centers, schools and conditioned parks, activated during episodes of extreme heat. The selection of these spaces was based directly on the atlas data.

Multitemporal analysis of the heat island effect in Seville

Researchers from the University of Seville used satellite data from Sentinel-3 and Landsat 8 to study the evolution of the heat island phenomenon in the city between 2015 and 2022. The aim was to evaluate the effectiveness of certain urban actions – such as the "Green your neighbourhood" plan – and to anticipate the effects of climate change on the city.

Methodology:

Thermal imaging and NDVI data were used to calculate temperature differences between urban areas and surrounding rural areas. Supervised classification techniques were also applied to identify land uses and their evolution. Open data from tree inventories and urban shade maps were used to interpret the results.

Main results:

Specific renaturation actions have a very positive local impact, but their effect on the city as a whole is limited if they are not integrated into a metropolitan-scale strategy. The study concluded that a continuous network of vegetation and bodies of water is more effective than isolated actions.

European comparison of the Urban Heat Island Atlas (Copernicus) project

Although it is not a Spanish study, the viewer developed by Copernicus for the European Urban Atlas programme offers a comparative analysis between European cities.

Methodology:

The viewer integrates Sentinel-3 thermal imagery, land cover data, and urban mapping to assess the severity of the heat island effect.

Diagrama</p>
<p>El contenido generado por IA puede ser incorrecto.

Figure 4. Illustration: Infographic showing the main factors causing the urban heat island effect (UHI). Urban areas retain heat due to tall buildings, impermeable surfaces and heat-retaining materials, while green areas are cooler Source: Urban heat islands.

Main results:

This type of tool allows smaller cities to have a first approximation of the phenomenon without the need to develop their own models. As it is based on open and free data, the viewer allows direct consultations by technicians and citizens.

Current limitations and challenges

Despite progress in opening up data, there are still significant challenges:

  • Territorial inequality: not all cities have the same quality and quantity of data.

  • Irregular update: Some sets are released on a one-off basis and are not updated regularly.

  • Low granularity: Data is often aggregated by districts or census tracts, making street-scale interventions difficult.

  • Lack of technical capacities: Many local governments do not have staff specialized in geospatial analysis.

  • Little connection with citizens: the knowledge generated from data does not always translate into visible or understandable actions for the population.

Conclusion: building climate resilience from geoespatial data

Urban heat islands are not a new phenomenon, but in the context of climate change they take on a critical dimension. Cities that do not plan based on data will be increasingly exposed to episodes of extreme heat, with unequal impacts among their populations.

Open data—and in particular geospatial data—offers a strategic opportunity to transform this threat into a lever for change. With them we can identify, anticipate, intervene and evaluate. But for this to happen, it is essential to:

·        Consolidate accessible, up-to-date and quality data infrastructures.

·        To promote collaboration between levels of government, research centres and citizens.

·        Train municipal technicians in the use of geospatial tools.

·        Promote a culture of evidence-based decision-making and climate sensitivity.

Data does not replace politics, but it allows it to be founded, improved and made more equitable. In a global warming scenario, having open geospatial data is a key tool to make our cities more livable and better prepared for the future.


Content prepared by Mayte Toscano, Senior Consultant in Data Economy Technologies. The contents and points of view reflected in this publication are the sole responsibility of the author.

calendar icon
Blog

Sport has always been characterized by generating a lot of data, statistics, graphs... But accumulating figures is not enough. It is necessary to analyze the data, draw conclusions and make decisions based on it. The advantages of sharing data in this sector go beyond mere sports, having a positive impact on health and the economic sphere. they go beyond mere sports, having a positive impact on health and the economic sphere.

Artificial intelligence (AI) has also reached the professional sports sector and its ability to process huge amounts of data has opened the door to making the most of the potential of all that information. Manchester City, one of the best-known football clubs in the British Premier League, was one of the pioneers in using artificial intelligence to improve its sporting performance:  it uses AI algorithms for the selection of new talent and has collaborated in the development of WaitTime, an artificial intelligence platform that manages the attendance of crowds in large sports and leisure venues. In Spain, Real Madrid, for example, incorporated the use of artificial intelligence a few years ago and promotes forums on the impact of AI on sport.

Artificial intelligence systems analyze extensive volumes of data collected during training and competitions, and are able to provide detailed evaluations on the effectiveness of strategies and optimization opportunities. In addition, it is possible to develop alerts on injury risks, allowing prevention measures to be established, or to create personalized training plans that are automatically adapted to each athlete according to their individual needs. These tools have completely changed contemporary high-level sports preparation. In this post we are going to review some of these use cases.

From simple observation to complete data management to optimize results

Traditional methods of sports evaluation have evolved into highly specialized technological systems. Artificial intelligence and machine learning tools process massive volumes of information during training and competitions, converting statistics, biometric data and audiovisual content into  strategic insights for the management of athletes' preparation and health.

Real-time performance analysis systems  are one of the most established implementations in the sports sector. To collect this data, it is common to see athletes training with bands or vests that monitor different parameters in real time. Both these and other devices and sensors record movements, speeds and biometric data. Heart rate, speed or acceleration are some of the most common data. AI algorithms process this information, generating immediate results that help optimize personalized training programs for each athlete and tactical adaptations, identifying patterns to locate areas for improvement.

In this sense, sports artificial intelligence platforms evaluate both individual performance and collective dynamics in the case of team sports. To evaluate the tactical area, different types of data are analyzed according to the sports modality. In endurance disciplines, speed, distance, rhythm or power are examined, while in team sports data on the position of the players or the accuracy of passes or shots are especially relevant.

Another advance is AI cameras, which allow you to follow the trajectory of players on the field and the movements of different elements, such as the ball in ball sports. These systems generate a multitude of data on positions, movements and patterns of play. The analysis of these historical data sets allows us to identify strategic strengths and vulnerabilities both our own and those of our opponents. This helps to generate different tactical options and improve decision-making before a competition.

Health and well-being of athletes

Sports injury prevention systems analyze historical data and metrics in real-time. Its algorithms identify injury risk patterns, allowing personalized preventive measures to be taken for each athlete. In the case of football, teams such as Manchester United, Liverpool, Valencia CF and Getafe CF have been implementing these technologies for several years.

In addition to the data we have seen above, sports monitoring platforms also record physiological variables continuously: heart rate, sleep patterns, muscle fatigue and movement biomechanics. Wearable devices  with artificial intelligence capabilities detect indicators of fatigue, imbalances, or physical stress that precede injuries. With this data, the algorithms predict patterns that detect risks and make it easier to act preventively, adjusting training or developing specific recovery programs before an injury occurs. In this way, training loads, rep volume, intensity and recovery periods can be calibrated according to individual profiles. This predictive maintenance for athletes is especially relevant for teams and clubs in which athletes are not only sporting assets, but also economic ones. In addition, these systems also optimise sports rehabilitation processes, reducing recovery times in muscle injuries by up to 30% and providing predictions on the risk of relapse.

While not foolproof, the data indicates that these platforms predict approximately 50% of injuries during sports seasons, although they cannot predict when they will occur. The application of AI to healthcare in sport thus contributes to the extension of professional sports careers, facilitating optimal performance and the athlete's athletic well-being in the long term.

Improving the audience experience

Artificial intelligence is also revolutionizing the way fans enjoy sport, both in stadiums and at home. Thanks to natural language processing (NLP) systems, viewers can follow comments and subtitles in real time, facilitating access for people with hearing impairments or speakers of other languages. Manchester City has recently incorporated this technology for the generation of real-time subtitles on the screens of its stadium. These applications have also reached other sports disciplines: IBM Watson has developed a functionality that allows Wimbledon fans to watch the videos with highlighted commentary and AI-generated subtitles.

In addition, AI optimises the management of large capacities through sensors and predictive algorithms, speeding up access, improving security and customising services such as seat locations. Even in broadcasts, AI-powered tools offer instant statistics,  automated highlights, and smart cameras that follow the action without human intervention, making the experience more immersive and dynamic. The NBA uses Second Spectrum, a system that combines cameras with AI to analyze player movements and create visualizations, such as passing routes or shot probabilities. Other sports, such as golf or Formula 1, also use similar tools that enhance the fan experience.

Data privacy and other challenges

The application of AI in sport also poses significant ethical challenges. The collection and analysis of biometric information raises doubts about the security and protection of athletes' personal data, so it is necessary to establish protocols that guarantee the management of consent, as well as the ownership of such data.

Equity  is another concern, as the application of artificial intelligence gives competitive advantages to teams and organizations with greater economic resources, which can contribute to perpetuating inequalities.

Despite these challenges, artificial intelligence has radically transformed the professional sports landscape. The future of sport seems to be linked to the evolution of this technology. Its application promises to continue to elevate athlete performance and the public experience, although some challenges need to be overcome.

calendar icon
Blog

Imagine you want to know how many terraces there are in your neighbourhood, how the pollen levels in the air you breathe every day are evolving or whether recycling in your city is working well. All this information exists in your municipality's databases, but it sits in spreadsheets and technical documents that only experts know how to interpret.

This is where open data visualisation initiativescome in: they transform those seemingly cold numbers into stories that anyone can understand at a glance.  A colourful graph showing the evolution of traffic on your street, an interactive map showing the green areas of your city, or an infographic explaining how the municipal budget is spent. These tools make public information accessible, useful and, moreover, comprehensible to all citizens.

Moreover, the advantages of this type of solution are not only for the citizens, but also benefit the Administration that carries out the exercise, because it allows:

  • Detect and correct data errors.
  • Add new sets to the portal.
  • Reduce the number of questions from citizens.
  • Generate more trust on the part of society.

Therefore, visualising open data brings government closer to citizens, facilitates informed decision-making, helps public administrations to improve their open data offer and creates a more participatory society where we can all better understand how the public sector works. In this post, we present some examples of open data visualisation initiatives in regional and municipal open data portals.

Visualiza Madrid: bringing data closer to the public

Madrid City Council's open data portal has developed the initiative "Visualiza Madrid", a project born with the specific objective of making open data and its potential reach the general public , transcending specialised technical profiles. As Ascensión Hidalgo Bellota, Deputy Director General for Transparency of Madrid City Council, explained during the IV National Meeting on Open Data, "this initiative responds to the need to democratise access to public information".

Visualiza Madrid currently has 29 visualisations that cover different topics of interest to citizens, from information on hotel and restaurant terraces to waste management and urban traffic analysis. This thematic diversity demonstrates the versatility of visualisations as a tool for communicating information from very diverse sectors of public administration.

 In addition, the initiative has received external recognition this year through the Audaz 2,025 Awards, an initiative of the Spanish chapter of the Open Government Academic Network (RAGA Spain).The initiative has also received external recognition through the Audaz 2,025 Awards.

Castilla y León: comprehensive analysis of regional data

 The Junta de Castilla y León has also developed a portal specialised in analysis and visualisations that stands out for its comprehensive approach to the presentation of regional data. Its visualisation platform offers a systematic approach to the analysis of regional information, allowing users to explore different dimensions of the reality of Castilla y Leónthrough interactive and dynamic tools.

This initiative allows complex information to be presented in a structured and understandable way, facilitating both academic analysis and citizen use of the data. The platform integrates different sources of regional information, creating a coherent ecosystem of visualisations that provides a panoramic view of different aspects of regional management. Among the topics it offers are data on tourism, the labour market and budget execution. All the visualisations are made with open data sets from the regional portal of Castilla y León .

The Castilla y León approach demonstrates how visualisations can serve as a tool for territorial analysis, providing valuable insights on economic, social and demographic dynamics that are fundamental for the planning and evaluation of regional public policies.

Canary Islands: technological integration with interactive widgets .

On the other hand, the Government of the Canary Islands has opted for an innovative strategy through the implementation of widgets that allow the integration of open data visualisations of the Instituto Canario de Estadística (ISTAC) in different platforms and contexts. This technological approach represents a qualitative leap in the distribution and reuse of public data visualisations.

The widgets developed by the Canary Islands make it easier for third parties to embed official visualisations in their own applications, websites or analyses, exponentially expanding the scope and usefulness of Canary Islands open data. This strategy not only multiplies the points of access to public information, but also fosters the creation of a collaborative ecosystem where different actors can benefit from and contribute to the value of open data.

The Canarian initiative illustrates how technology can be used to create scalable and flexible solutions that maximise the impact of investments in open data visualisation, establishing a replicable model for other administrations seeking to amplify the reach of their transparency initiatives.

Lessons learned and best practices

By way of example, the cases analysed reveal common patterns that can serve as a guide for future initiatives. The orientation towards the general public, beyond specialised technical users, emerges as an opportunity factor for the success of these platforms. To maintain the interest and relevance of the visualisations, it is important to offer thematic diversity and to update the data regularly.

Technological integration and interoperability, as demonstrated in the case of the Canary Islands, open up new possibilities to maximise the impact of public investments in data visualisation. Likewise, external recognition and participation in professional networks, as evidenced in the case of Madrid, contribute to continuous improvement and the exchange of best practices between administrations.

In general terms, open data visualisation initiatives represent a very valuable opportunity in the transparency and open government strategy of Spanish public administrations. The cases of Madrid, Castilla y León, as well as the Canary Islands, are examples of the enormous potential for transforming public data into tools for citizen empowerment and improved public management.

The success of these initiatives lies in their ability to connect government information with the real needs of citizens, creating bridges of understanding that strengthen the relationship between administration and society. As these experiences mature and consolidate, it will be crucial to keep the focus on the usability, accessibility and relevance of visualisations, ensuring that open data truly delivers on its promise to contribute to a more informed, participatory and democratic society.

Open data visualisation is not just a technical issue, but a strategic opportunity to redefine public communication and strengthen the foundations of a truly open and transparent administration.

calendar icon
Blog

In an increasingly interconnected and complex world, geospatial intelligence (GEOINT) has become an essential tool for defence and security decision-making . The ability to collect, analyse and interpret geospatial data enables armed forces and security agencies to better understand the operational environment, anticipate threats and plan operations more effectively.

In this context, satellite data, classified but also open data, have acquired significant relevance. Programmes such as Copernicus of the European Union provide free and open access to a wide range of Earth observation data, which democratises access to critical information and fosters collaboration between different actors.

This article explores the role of data in geospatial intelligence applied to defence, highlighting its importance, applications and Spain's leadership in this field.

Geospatial intelligence (GEOINT) is a discipline that combines the collection, analysis and interpretation of geospatial data to support decision making in a variety of areas, including defence, security and emergency management. This data may include satellite imagery, remotely sensed information, geographic information system (GIS) data and other sources that provide information on the location and characteristics of the terrain.

In the defence domain, GEOINT enables military analysts and planners to gain a detailed understanding of the operational environment, identify potential threats and plan operations with greater precision. It also facilitates coordination between different units and agencies, improving the effectiveness of joint operations.

Defence application

The integration of open satellite data into geospatial intelligence has significantly expanded defence capabilities. Some of the most relevant applications are presented below:

  Vigilancia y reconocimiento Los datos satelitales abiertos permiten una vigilancia continua y de amplio alcance, lo que es esencial para el reconocimiento de áreas de interés estratégico. Esto incluye la monitorización de movimientos de tropas, la identificación de infraestructuras militares y la evaluación de cambios en el terreno que puedan indicar actividades hostiles. 	  Planificación de operaciones La información geoespacial detallada es crucial para la planificación de operaciones militares. Permite a los planificadores evaluar el terreno, identificar rutas de acceso y salida, y anticipar posibles obstáculos o amenazas en el entorno operativo.   Respuesta a emergencias En situaciones de crisis, como desastres naturales o conflictos armados, los datos satelitales abiertos proporcionan información actualizada y precisa que es vital para la respuesta rápida y eficaz. Esto incluye la evaluación de daños, la identificación de áreas afectadas y la planificación de operaciones de ayuda humanitaria.	  Apoyo a la toma de decisiones La inteligencia geoespacial basada en datos abiertos mejora la toma de decisiones al proporcionar una comprensión más completa del entorno operativo. Esto es especialmente importante en contextos complejos y dinámicos, donde la información precisa y oportuna puede marcar la diferencia entre el éxito y el fracaso de una operación.

Figure 1. GEOINT applications in defence. Source: own elaboration

Geospatial intelligence not only supports the military in making tactical decisions, but also transforms the way military, surveillance and emergency response operations are planned and executed. Here we present concrete use cases where GEOINT, supported by open satellite data, has had a decisive impact.

Monitoring of military movements in conflicts

Case. Ukraine War (2,022-2,024)

Organisations such as the EU Satellite Centre (SatCen) and NGOs such as the Conflict Intelligence Team have used Sentinel-1 and Sentinel-2 (Copernicus) imagery for the Conflict Intelligence Team :

  • Detect concentrations of Russian troops and military equipment.
  • Analyse changes to airfields, bases or logistics routes.
  • Support independent verification of events on the ground.

This has been key to EU and NATO decision-making, without the need to resort to classified data.

Maritime surveillance and border control

Case. FRONTEX operations in the Mediterranean

GEOINT powered by Sentinel-1 (radar) and Sentinel-3 (optical + altimeter) allows:

  • Identify unauthorised vessels, even under cloud cover or at night.
  • Integrate alerts with AIS (automatic ship identification system).
  • Coordinate rescue and interdiction operations.

Advantage: Sentinel-1's synthetic aperture radar (SAR) can see through clouds, making it ideal for continuous surveillance.

Support to peace missions and humanitarian aid

Case. Earthquake in Syria/Turkey (2,023)

Open data (Sentinel-2, Landsat-8, PlanetScope free after catastrophe) were used for: 

  • Detect collapsed areas and assess damage.
  • Plan safe access routes.

Coordinate camps and resources with military support.

Spain's role

Spain has demonstrated a significant commitment to the development and application of geospatial intelligence in defence.

European Union Satellite Centre (SatCen)

Project Zeus of the Spanish Army

Participation in European Programmes National capacity building

Located in Torrejón de Ardoz, SatCen is a European Union agency that provides geospatial intelligence products and services to support security and defence decision-making. Spain, as host country, plays a central role in SatCen operations.

The Spanish Army has launched the Zeus project, a technological initiative that integrates artificial intelligence, 5G networks and satellite data to improve operational capabilities. This project aims to create a tactical combat cloud to enable greater interoperability and efficiency in military operations.

Spain actively participates in European programmes related to Earth observation and geospatial intelligence, such as Copernicus and MUSIS. In addition, it collaborates in bilateral and multilateral initiatives for satellite capacity building and data sharing.

At the national level, Spain has invested in the development of its own geospatial intelligence capabilities, including the training of specialised personnel and the acquisition of advanced technologies. These investments reinforce the country's strategic autonomy and its ability to contribute to international operations.

Figure 2. Comparative table of Spain's participation in different satellite projects. Source: own elaboration

Challenges and opportunities

While open satellite data offers many advantages, it also presents certain challenges that need to be addressed to maximise its usefulness in the defence domain.

  • Data quality and resolution: While open data is valuable, it often has limitations in terms of spatial and temporal resolution compared to commercial or classified data. This may affect its applicability in certain operations requiring highly accurate information.

  • Data integration: The integration of data from multiple sources, including open, commercial and classified data, requires systems and processes to ensure interoperability and consistency of information. This involves technical and organisational challenges that must be overcome.

  • Security and confidentiality: The use of open data in defence contexts raises questions about the security and confidentiality of information. It is essential to establish security protocols and measures to protect sensitive information and prevent its misuse.

  • Opportunities for collaboration: Despite these challenges, open satellite data offer significant opportunities for collaboration between different actors, including governments, international organisations, the private sector and civil society. Such collaboration can improve the effectiveness of defence operations and contribute to greater global security.

Recommendations for strengthening the use of open data in defence

Based on the above analysis, some key recommendations can be drawn to better exploit the potential of open data:

  1. Strengthening open data infrastructures: consolidate national platforms integrating open satellite data for both civilian and military use, with a focus on security and interoperability.

  2. Promotion of open geospatial standards (OGC, INSPIRE): Ensure that defence systems integrate international standards that allow the combined use of open and classified sources.

  3. Specialised training: foster the development of capabilities in GEOINT analysis with open data, both in the military domain and in collaboration with universities and technology centres.

  4. Civil-military cooperation: establish protocols to facilitate the exchange of data between civilian agencies (AEMET, IGN, Civil Protection) and defence actors in crisis or emergency situations.

  5. Support to R&D&I: to foster research projects exploring the advanced use of open data (e.g. AI applied to Sentinel) with dual applications (civilian and security).

Conclusion

Geospatial intelligence and the use of open satellite data have transformed the way armed forces and security agencies plan and execute their operations. In a context of multidimensional threats and constantly evolving scenarios, having accurate, accessible and up-to-date information is more than an advantage: it is a strategic necessity.

Open data has established itself as a fundamental asset not only because it is free of charge, but also because of its ability to democratise access to critical information, foster transparency and enable new forms of collaboration between military, civilian and scientific actors. In particular:

  • Improve the resilience of defence systems by enabling broader, cross-cutting analysis of the operating environment.
  • Increase interoperability, as open formats and standards facilitate exchange between countries and agencies.
  • They drive innovationby providing startups, research centres and universities with access to quality data that would otherwise be inaccessible.

In this context, Spain has demonstrated a clear commitment to this strategic vision, both through its national institutions and its active role in European programmes such as Copernicus, Galileo and the common defence missions.


Content prepared by Mayte Toscano, Senior Consultant in Data Economy Technologies. The contents and points of view reflected in this publication are the sole responsibility of the author.

calendar icon
Blog

Data is a fundamental resource for improving our quality of life because it enables better decision-making processes to create personalised products and services, both in the public and private sectors. In contexts such as health, mobility, energy or education, the use of data facilitates more efficient solutions adapted to people's real needs. However, in working with data, privacy plays a key role. In this post, we will look at how data spaces, the federated computing paradigm and federated learning, one of its most powerful applications, provide a balanced solution for harnessing the potential of data without compromising privacy. In addition, we will highlight how federated learning can also be used with open data to enhance its reuse in a collaborative, incremental and efficient way.

Privacy, a key issue in data management

As mentioned above, the intensive use of data requires increasing attention to privacy. For example, in eHealth, secondary misuse of electronic health record data could violate patients' fundamental rights. One effective way to preserve privacy is through data ecosystems that prioritise data sovereignty, such as data spaces. A dataspace is a federated data management system that allows data to be exchanged reliably between providers and consumers. In addition, the data space ensures the interoperability of data to create products and services that create value. In a data space, each provider maintains its own governance rules, retaining control over its data (i.e. sovereignty over its data), while enabling its re-use by consumers. This implies that each provider should be able to decide what data it shares, with whom and under what conditions, ensuring compliance with its interests and legal obligations.

Federated computing and data spaces

Data spaces represent an evolution in data management, related to a paradigm called federated computing, where data is reused without the need for data flow from data providers to consumers. In federated computing, providers transform their data into privacy-preserving intermediate results so that they can be sent to data consumers. In addition, this enables other Data Privacy-Enhancing Technologies(Privacy-Enhancing Technologies)to be applied. Federated computing aligns perfectly with reference architectures such as Gaia-X and its Trust Framework, which sets out the principles and requirements to ensure secure, transparent and rule-compliant data exchange between data providers and data consumers.

Federated learning

One of the most powerful applications of federated computing is federated machine learning ( federated learning), an artificial intelligence technique that allows models to be trained without centralising data. That is, instead of sending the data to a central server for processing, what is sent are the models trained locally by each participant.

These models are then combined centrally to create a global model. As an example, imagine a consortium of hospitals that wants to develop a predictive model to detect a rare disease. Every hospital holds sensitive patient data, and open sharing of this data is not feasible due to privacy concerns (including other legal or ethical issues). With federated learning, each hospital trains the model locally with its own data, and only shares the model parameters (training results) centrally. Thus, the final model leverages the diversity of data from all hospitals without compromising the individual privacy and data governance rules of each hospital.

Training in federated learning usually follows an iterative cycle:

  1. A central server starts a base model and sends it to each of the participating distributed nodes.
  2. Each node trains the model locally with its data.
  3. Nodes return only the parameters of the updated model, not the data (i.e. data shuttling is avoided).
  4. The central server aggregates parameter updates, training results at each node and updates the global model.
  5. The cycle is repeated until a sufficiently accurate model is achieved.

Central server -> starts a base model ---> sends it to each of the participating nodes 2. Node (1), node (2), node (X) ---> train the model locally with their data --> return the parameters of the updated model (data transfer is avoided) 3. Central server -> adds updates to the parameters (training results at each node) -> updates the global model *The cycle repeats until an accurate model is obtained

Figure 1. Visual representing the federated learning training process. Own elaboration

This approach is compatible with various machine learning algorithms, including deep neural networks, regression models, classifiers, etc.

Benefits and challenges of federated learning

Federated learning offers multiple benefits by avoiding data shuffling. Below are the most notable examples:

  1. Privacy and compliance: by remaining at source, data exposure risks are significantly reduced and compliance with regulations such as the General Data Protection Regulation (GDPR) is facilitated.
  2. Data sovereignty: Each entity retains full control over its data, which avoids competitive conflicts.
  3. Efficiency: avoids the cost and complexity of exchanging large volumes of data, speeding up processing and development times.
  4. Trust: facilitates frictionless collaboration between organisations.

There are several use cases in which federated learning is necessary, for example:

  • Health: Hospitals and research centres can collaborate on predictive models without sharing patient data.
  • Finance: banks and insurers can build fraud detection or risk-sharing analysis models, while respecting the confidentiality of their customers.
  • Smart tourism: tourist destinations can analyse visitor flows or consumption patterns without the need to unify the databases of their stakeholders (both public and private).
  • Industry: Companies in the same industry can train models for predictive maintenance or operational efficiency without revealing competitive data.

While its benefits are clear in a variety of use cases, federated learning also presents technical and organisational challenges:

  • Data heterogeneity: Local data may have different formats or structures, making training difficult. In addition, the layout of this data may change over time, which presents an added difficulty.
  • Unbalanced data: Some nodes may have more or higher quality data than others, which may skew the overall model.
  • Local computational costs: Each node needs sufficient resources to train the model locally.
  • Synchronisation: the training cycle requires good coordination between nodes to avoid latency or errors.

Beyond federated learning

Although the most prominent application of federated computing is federated learning, many additional applications in data management are emerging, such as federated data analytics (federated analytics). Federated data analysis allows statistical and descriptive analyses to be performed on distributed data without the need to move the data to the consumers; instead, each provider performs the required statistical calculations locally and only shares the aggregated results with the consumer according to their requirements and permissions. The following table shows the differences between federated learning and federated data analysis.

 

Criteria

Federated learning

Federated data analysis

Target

Prediction and training of machine learning models.  Descriptive analysis and calculation of statistics. 
Task type Predictive tasks (e.g. classification or regression). Descriptive tasks (e.g. means or correlations). 
Example Train models of disease diagnosis using medical images from various hospitals. Calculation of health indicators for a health area without moving data between hospitals.
Expected output Modelo global entrenado. Resultados estadísticos agregados.
Nature Iterativa. Directa.
Computational complexity​ Alta. Media. 
Privacy and sovereignty High Average
Algorithms Machine learning. Statistical algorithms.

Figure 1. Comparative table. Source: own elaboration

Federated learning and open data: a symbiosis to be explored

In principle, open data resolves privacy issues prior to publication, so one would think that federated learning techniques would not be necessary. Nothing could be further from the truth. The use of federated learning techniques can bring significant advantages in the management and exploitation of open data. In fact, the first aspect to highlight is that open data portals such as datos.gob.es or data.europa.eu are federated environments. Therefore, in these portals, the application of federated learning on large datasets would allow models to be trained directly at source, avoiding transfer and processing costs. On the other hand, federated learning would facilitate the combination of open data with other sensitive data without compromising the privacy of the latter. Finally, the nature of a wide variety of open data types is very dynamic (such as traffic data), so federated learning would enable incremental training, automatically considering new updates to open datasets as they are published, without the need to restart costly training processes.

Federated learning, the basis for privacy-friendly AI

Federated machine learning represents a necessary evolution in the way we develop artificial intelligence services, especially in contexts where data is sensitive or distributed across multiple providers. Its natural alignment with the concept of the data space makes it a key technology to drive innovation based on data sharing, taking into account privacy and maintaining data sovereignty.

As regulation (such as the European Health Data Space Regulation) and data space infrastructures evolve, federated learning, and other types of federated computing, will play an increasingly important role in data sharing, maximising the value of data, but without compromising privacy. Finally, it is worth noting that, far from being unnecessary, federated learning can become a strategic ally to improve the efficiency, governance and impact of open data ecosystems.


Jose Norberto Mazón, Professor of Computer Languages and Systems at the University of Alicante. The contents and views reflected in this publication are the sole responsibility of the author.

calendar icon
Documentación

In the current landscape of data analysis and artificial intelligence, the automatic generation of comprehensive and coherent reports represents a significant challenge. While traditional tools allow for data visualization or generating isolated statistics, there is a need for systems that can investigate a topic in depth, gather information from diverse sources, and synthesize findings into a structured and coherent report.

In this practical exercise, we will explore the development of a report generation agent based on LangGraph and artificial intelligence. Unlike traditional approaches based on templates or predefined statistical analysis, our solution leverages the latest advances in language models to:

  1. Create virtual teams of analysts specialized in different aspects of a topic.
  2. Conduct simulated interviews to gather detailed information.
  3. Synthesize the findings into a coherent and well-structured report.

Access the data laboratory repository on Github.

Run the data preprocessing code on Google Colab.

As shown in Figure 1, the complete agent flow follows a logical sequence that goes from the initial generation of questions to the final drafting of the report.

Diagrama de flujo del funcionamiento del agente

Figure 1. Agent flow diagram.

Application Architecture

The core of the application is based on a modular design implemented as an interconnected state graph, where each module represents a specific functionality in the report generation process. This structure allows for a flexible workflow, recursive when necessary, and with capacity for human intervention at strategic points.

Main Components

The system consists of three fundamental modules that work together:

1. Virtual Analysts Generator

This component creates a diverse team of virtual analysts specialized in different aspects of the topic to be investigated. The flow includes:

  • Initial creation of profiles based on the research topic.
  • Human feedback point that allows reviewing and refining the generated profiles.
  • Optional regeneration of analysts incorporating suggestions.

This approach ensures that the final report includes diverse and complementary perspectives, enriching the analysis.

2. Interview System

Once the analysts are generated, each one participates in a simulated interview process that includes:

  • Generation of relevant questions based on the analyst's profile.
  • Information search in sources via Tavily Search and Wikipedia.
  • Generation of informative responses combining the obtained information.
  • Automatic decision on whether to continue or end the interview based on the information gathered.
  • Storage of the transcript for subsequent processing.

The interview system represents the heart of the agent, where the information that will nourish the final report is obtained. As shown in Figure 2, this process can be monitored in real time through LangSmith, an open observability tool that allows tracking each step of the flow.

Logs de Langsmith, plataforma de monitorizaci'on

Figure 2. System monitoring via LangGraph. Concrete example of an analyst-interviewer interaction.

3. Report Generator

Finally, the system processes the interviews to create a coherent report through:

  • Writing individual sections based on each interview.
  • Creating an introduction that presents the topic and structure of the report.
  • Organizing the main content that integrates all sections.
  • Generating a conclusion that synthesizes the main findings.
  • Consolidating all sources used.

The Figure 3 shows an example of the report resulting from the complete process, demonstrating the quality and structure of the final document generated automatically.

Informe generado por el agente

Figure 3. View of the report resulting from the automatic generation process to the prompt "Open data in Spain".

 

What can you learn?

This practical exercise allows you to learn:

Integration of advanced AI in information processing systems:

  • How to communicate effectively with language models.
  • Techniques to structure prompts that generate coherent and useful responses.
  • Strategies to simulate virtual teams of experts.

Development with LangGraph:

  • Creation of state graphs to model complex flows.
  • Implementation of conditional decision points.
  • Design of systems with human intervention at strategic points.

Parallel processing with LLMs:

  • Parallelization techniques for tasks with language models.
  • Coordination of multiple independent subprocesses.
  • Methods for consolidating scattered information.

Good design practices:

  • Modular structuring of complex systems.
  • Error handling and retries.
  • Tracking and debugging workflows through LangSmith.

Conclusions and future

This exercise demonstrates the extraordinary potential of artificial intelligence as a bridge between data and end users. Through the practical case developed, we can observe how the combination of advanced language models with flexible architectures based on graphs opens new possibilities for automatic report generation.

The ability to simulate virtual expert teams, perform parallel research and synthesize findings into coherent documents, represents a significant step towards the democratization of analysis of complex information.

For those interested in expanding the capabilities of the system, there are multiple promising directions for its evolution:

  • Incorporation of automatic data verification mechanisms to ensure accuracy.
  • Implementation of multimodal capabilities that allow incorporating images and visualizations.
  • Integration with more sources of information and knowledge bases.
  • Development of more intuitive user interfaces for human intervention.
  • Expansion to specialized domains such as medicine, law or sciences.

In summary, this exercise not only demonstrates the feasibility of automating the generation of complex reports through artificial intelligence, but also points to a promising path towards a future where deep analysis of any topic is within everyone's reach, regardless of their level of technical experience. The combination of advanced language models, graph architectures and parallelization techniques opens a range of possibilities to transform the way we generate and consume information.

calendar icon
Blog

The evolution of generative AI has been dizzying: from the first great language models that impressed us with their ability to reproduce human reading and writing, through the advanced RAG (Retrieval-Augmented Generation) techniques that quantitatively improved the quality of the responses provided and the emergence of intelligent agents, to an innovation that redefines our relationship with technology: Computer use.

At the end of April 2020, just one month after the start of an unprecedented period of worldwide home confinement due to the SAR-Covid19 global pandemic, we spread from datos.gob.es the large GPT-2 and GPT-3 language models. OpenAI, founded in 2015, had presented almost a year earlier (February 2019) a new language model that was able to generate written text virtually indistinguishable from that created by a human. GPT-2 had been trained on a corpus (set of texts prepared to train language models) of about 40 GB (Gigabytes) in size (about 8 million web pages), while the latest family of models based on GPT-4 is estimated to have been trained on TB (Terabyte) sized corpora; a thousand times more.

In this context, it is important to talk about two concepts:

  • LLLMs (Large Language Models ): are large-scale language models, trained on vast amounts of data and capable of performing a wide range of linguistic tasks. Today, we have countless tools based on these LLMs that, by field of expertise, are able to generate programming code, ultra-realistic images and videos, and solve complex mathematical problems. All major companies and organisations in the digital-technology sector have embarked on integrating these tools into their different software and hardware products, developing use cases that solve or optimise specific tasks and activities that previously required a high degree of human intervention.
  • Agents: The user experience with artificial intelligence models is becoming more and more complete, so that we can ask our interface not only to answer our questions, but also to perform complex tasks that require integration with other IT tools. For example, we not only ask a chatbot for information on the best restaurants in the area, but we also ask them to search for table availability for specific dates and make a reservation for us. This extended user experience is what artificial intelligence agentsprovide us with. Based on the large language models, these agents are able to interact with the outside world (to the model) and "talk" to other services via APIs and programming interfaces prepared for this purpose.

Computer use

However, the ability of agents to perform actions autonomously depends on two key elements: on the one hand, their concrete programming - the functionality that has been programmed or configured for them; on the other hand, the need for all other programmes to be ready to "talk" to these agents. That is, their programming interfaces must be ready to receive instructions from these agents. For example, the restaurant reservation application has to be prepared, not only to receive forms filled in by a human, but also requests made by an agent that has been previously invoked by a human using natural language. This fact imposes a limitation on the set of activities and/or tasks that we can automate from a conversational interface. In other words, the conversational interface can provide us with almost infinite answers to the questions we ask it, but it is severely limited in its ability to interact with the outside world due to the lack of preparation of the rest of computer applications.

This is where Computer use comes in. With the arrival of the Claude 3.5 Sonnet model, Anthropic has introduced Computer use, a beta capability that allows AI to interact directly with graphical user interfaces.

How does Computer use work?

Claude can move your computer cursor as if it were you, click buttons and type text, emulating the way humans operate a computer. The best way to understand how Computer use works in practice is to see it in action. Here is a link directly to the YouTube channel of the specific Computer use section.

Figure 1. Screenshot from Anthropic's YouTube channel, Computer use specific section.

Would you like to try it?

If you've made it this far, you can't miss out without trying it with your own hands.

Here is a simple guide to testing Computer use in an isolated environment. It is important to take into account the security recommendations that Antrophic proposes in its Computer use guidelines. This feature of the Claude Sonet model can perform actions on a computer and this can be potentially dangerous, so it is recommended to carefully review the security warning of Computer use.

All official developer documentation can be found in the antrophic's official Github repository. In this post, we have chosen to run Computer use in a Docker container environment. It is the easiest and safest way to test it. If you don't already have it, you can follow the simple official guidelines to pre-install it on your system.

To reproduce this test we propose to follow this script step by step:.

  1. Anthropic API Key. To interact with Claude Sonet you need an Anthropic account which you can create for free here. Once inside, you can go to the API Keys section and create a new one for your test
  1. Once you have your API Key, you must run this command in your terminal, substituting your key where it says "%your_api_key%":

3. If everything went well, you will see these messages in your terminal and now you just have to open your browser and type this url in the navigation bar: htttp://localhost:8080/.

You will see your interface open:

Figure 2. Computer use interface.

You can now go to explore how Computer use works. We suggest the following prompt to get you started:

We suggest you start small. For example, ask them to open a browser and search for something. You can also ask him to give you information about your computer or operating system. Gradually, you can ask for more complex tasks. We have tested this prompt and after several trials we have managed to get Computer use to perform the complete task.

Open a browser, navigate to the datos.gob.es catalogue, use the search engine to locate a dataset on: Public security. Traffic accidents. 2014; Locate the file in csv format; download and open it with free Office.

Potential uses in data platforms such as datos.gob.es

In view of this first experimental version of Computer use, it seems that the potential of the tool is very high. We can imagine how many more things we can do thanks to this tool. Here are some ideas:

  • We could ask the system to perform a complete search of datasets related to a specific topic and summarise the main results in a document. In this way, if for example we write an article on traffic data in Spain, we could unattended obtain a list of the main open datasets of traffic data in Spain in the datos.gob.es catalogue.
  • In the same way, we could request a summary in the same way, but in this case, not of dataset, but of platform items.
  • A slightly more sophisticated example would be to ask Claude, through the conversational interface of Computer use, to make a series of calls to the data API.gob.es to obtain information from certain datasets programmatically. To do this, we open a browser and log into an application such as Postman (remember at this point that Computer use is in experimental mode and does not allow us to enter sensitive data such as user credentials on web pages). We can then ask you to search for information about the datos.gob.es API and execute an http call, taking advantage of the fact that this API does not require authentication.

Through these simple examples, we hope that we have introduced you to a new application of generative AI and that you have understood the paradigm shift that this new capability represents. If the machine is able to emulate the use of a computer as we humans do, unimaginable new opportunities will open up in the coming months.


Content prepared by Alejandro Alija, expert in Digital Transformation and Innovation. The contents and points of view reflected in this publication are the sole responsibility of the author.

calendar icon
Blog

The enormous acceleration of innovation in artificial intelligence (AI) in recent years has largely revolved around the development of so-called "foundational models". Also known as Large [X] Models (Large [X] Models or LxM), Foundation Models, as defined by the Center for Research on Foundation Models (CRFM) of the Institute for Human-Centered Artificial Intelligence's (HAI) Stanford University's models that have been trained on large and highly diverse datasets and can be adapted to perform a wide range of tasks using techniques such as fine-tuning (fine-tuning).

It is precisely this versatility and adaptability that has made foundational models the cornerstone of the numerous applications of artificial intelligence being developed, since a single base architecture can be used across a multitude of use cases with limited additional effort.

Types of foundational models

The "X" in LxM can be replaced by several options depending on the type of data or tasks for which the model is specialised. The best known by the public are the LLM (Large Language Models), which are at the basis of applications such as ChatGPT or Gemini, and which focus on natural language understanding and generation.. LVMs (Large Vision Models), such as DINOv2 or CLIP, are designed tointerpret images and videos, recognise objects or generate visual descriptions..  There are also models such as Operator or Rabbit R1 that fall into the LAM (Large Action Models) category and are aimed atexecuting actions from complex instructions..

As regulations have emerged in different parts of the world, so have other definitions that seek to establish criteria and responsibilities for these models to foster confidence and security. The most relevant definition for our context is that set out in the European Union AI Regulation (AI Act), which calls them "general-purpose AI models" and distinguishes them by their "ability to competently perform a wide variety of discrete tasks" and because they are "typically trained using large volumes of data and through a variety of methods, such as self-supervised, unsupervised or reinforcement learning".

Foundational models in Spanish and other co-official languages

Historically, English has been the dominant language in the development of large AI models, to the extent that around 90% of the training tokens of today's large models are drawn from English texts. It is therefore logical that the most popular models, for example OpenAI's GPT family, Google's Gemini or Meta's Llama, are more competent at responding in English and perform less well when used in other languages such as Spanish.

Therefore, the creation of foundational models in Spanish, such as ALIA, is not simply a technical or research exercise, but a strategic move to ensure that artificial intelligence does not further deepen the linguistic and cultural asymmetries that already exist in digital technologies in general. The development of ALIA, driven by the Spain's Artificial Intelligence Strategy 2024, "based on the broad scope of our languages, spoken by 600 million people, aims to facilitate the development of advanced services and products in language technologies, offering an infrastructure marked by maximum transparency and openness".

Such initiatives are not unique to Spain. Other similar projects include BLOOM, a 176-billion-parameter multilingual model developed by more than 1,000 researchers worldwide and supporting 46 natural languages and 13 programming languages. In China, Baidu has developed ERNIE, a model with strong Mandarin capabilities, while in France the PAGNOL model has focused on improving French capabilities. These parallel efforts show a global trend towards the "linguistic democratisation" of AI.

 Since the beginning of 2025, the first language models in the four co-official languages, within the ALIA project, have been available.  The ALIA family of models includes ALIA-40B, a model with 40.40 billion parameters, which is currently the most advanced public multilingual foundational model in Europeand which was trained for more than 8 months on the MareNostrum 5 supercomputer, processing 6.9 trillion tokens equivalent to about 33 terabytes of text (about 17 million books!). Here all kinds of official documents and scientific repositories in Spanish are included, from congressional journals to scientific repositories or official bulletins to ensure the richness and quality of your knowledge.

Although this is a multilingual model, Spanish and co-official languages have a much higher weight than usual in these models, around 20%, as the training of the model was designed specifically for these languages, reducing the relevance of English and adapting the tokens to the needs of Spanish, Catalan, Basque and Galician.. As a result, ALIA "understands" our local expressions and cultural nuances better than a generic model trained mostly in English.

Applications of the foundational models in Spanish and co-official languages

It is still too early to judge the impact on specific sectors and applications that ALIA and other models that may be developed from this experience may have. However, they are expected to serve as a basis for improving many Artificial Intelligence applications and solutions:.

  • Public administration and government: ALIA could give life to virtual assistants that attend to citizens 24 hours a day for procedures such as paying taxes, renewing ID cards, applying for grants, etc., as it is specifically trained in Spanish regulations.  In fact, a pilot for the Tax Agency using ALIA, which would aim to streamline internal procedures, has already been announced.
  • Education: A model such as ALIA could also be the basis for personalised virtual tutors to guide students in Spanish and co-official languages. For example, assistants who explain concepts of mathematics or history in simple language and answer questions from the students, adapting to their level since, knowing our language well, they would be able to provide important nuances in the answers and understand the typical doubts of native speakers in these languages. They could also help teachers by generating exercises or summaries of readings or assisting them in correcting students' work.
  • Health: ALIA could be used to analyse medical texts and assist healthcare professionals with clinical reports, medical records, information leaflets, etc. For example, it could review patient files to extract key elements, or assist professionals in the diagnostic process.  In fact, the Ministry of Health is planning a pilot application with ALIA to improve early detection of heart failure in primary care.
  • Justice: In the legal field, ALIA would understand technical terms and contexts of Spanish law much better than a non-specialised model as it has been trained with legal vocabulary from official documents. An ALIA-based virtual paralegal could be able to answer basic citizen queries, such as how to initiate a given legal procedure, citing the applicable law. The administration of justice could also benefit from much more accurate machine translations of court documents between co-official languages.

Future lines

The development of foundation models in Spanish, as in other languages, is beginning to be seen outside the United States as a strategic issue that contributes to guaranteeing the technological sovereignty of countries.  Of course, it will be necessary to continue training more advanced versions (models with up to 175 billion parameters are targeted, which would be comparable to the most powerful in the world), incorporating new open data, and fine-tuning applications. From the Data Directorate and the SEDIA it is intended to continue to support the growth of this family of models, to keep it at the forefront and ensure its adoption.

On the other hand, these first foundational models in Spanish and co-official languages have initially focused on written language, so the next natural frontier could be multimodality. Integrating the capacity to manage images, audio or video in Spanish together with the text would multiply its practical applications, since the interpretation of images in Spanish is one of the areas where the greatest deficiencies are detected in the large generic models.

Ethical issues will also need to be monitored to ensure that these models do not perpetuate bias and are useful for all groups, including those who speak different languages or have different levels of education.   In this respect,  Explainable Artificial Intelligence (XAI) is not optional, but a fundamental requirement to ensure its responsible adoption..  The National AI Supervisory Agency, the research community and civil society itself will have an important role to play here.


Content prepared by Jose Luis Marín, Senior Consultant in Data, Strategy, Innovation &amp; Digitalization. The contents and views reflected in this publication are the sole responsibility of the author.

calendar icon
Blog

The General direction of traffic (DGT in its Spanish acronym) is the body responsible for ensuring safety and fluidity on the roads in Spain. Among other activities, it is responsible for the issuing of permits, traffic control and the management of infringements.

As a result of its activity, a large amount of data is generated, much of which is made available to the public as open data. These datasets not only promote transparency but are also a tool to encourage innovation and improve road safety through their re-use by researchers, companies, public administrations and interested citizens.

In this article we will review some of these datasets, including application examples.

How to access the DGT's datasets

DGT datasets provide detailed and structured information on various aspects of road safety and mobility in Spain, ranging from accident statistics to vehicle and driver information. The temporal continuity of these datasets, available from the beginning of the century to the present day, enables longitudinal analyses that reflect the evolution of mobility and road safety patterns in Spain.

Users can access datasets in different spaces:

  • DGT en cifras. It is a section of the General direction of traffic that offers a centralised access to statistics and key data related to road safety, vehicles and drivers in Spain. This portal includes detailed information on accidents, complaints, vehicle fleet and technical characteristics of vehicles, among other topics. It also provides historical and comparative analyses to assess trends and design strategies to improve road safety.
  • National Access Point (NAP). Managed by the DGT, it is a platform designed to centralise and facilitate access to road and traffic data, including updates. This portal has been created under the framework of the European Directive 2010/40/EU and brings together information provided by various traffic management bodies, road authorities and infrastructure operators. Available data includes traffic incidents, electric vehicle charging points, low-emission zones and car park occupancy, among others. It aims to promote interoperability and the development of intelligent solutions that improve road safety and transport efficiency.

While the NAP is focused on real-time data and technological solutions, DGT en cifras focuses on providing statistics and historical information for analysis and decision making. In addition, the NAP collects data not only from the DGT, but also from other agencies and private companies.

Most of these data are available through datos.gob.es.

Examples of DGT datasets

Some examples of datasets that can be found in datos.gob.es are:

  • Accidents with casualties: includes detailed information on fatalities, hospitalised and minor injuries, as well as circumstances by road type. This data helps to understand why accidents happen and who is involved, to identify risky situations and to detect dangerous behaviour on the road. It is useful for creating better prevention campaigns, detecting black spots on the roads and helping authorities to make more informed decisions. They are also of interest to public health professionals, urban planners and insurance companies working to reduce accidents and their consequences.
  • Census of drivers: provides a complete x-ray of who has a driving licence in Spain. The information is particularly useful for understanding the evolution of the driver fleet, identifying demographic trends and analysing the penetration of different types of licences by territory and gender.
  • Car registrations by make and engine capacity: shows which new cars Spaniards bought, organised by brand and engine power. The information allows consumer trends to be identified. This data is valuable for manufacturers, dealers and automotive analysts, who can study the market performance in a given year. They are also useful for researchers in mobility, environment and economics, allowing to analyse the evolution of the Spanish vehicle fleet and its impact in terms of emissions, energy consumption and market trends.

Use cases of DGT datasets

The publication of this data in an open format enables innovation in areas such as accident prevention, the development of safer road infrastructure, the development of evidence-based public policies, and the creation of mobility-related technology applications. Some examples are given below:

DGT's own use of data

The DGT itself reuses its data to create tools that facilitate the visualisation of the information and bring it closer to citizens in a simple way. This is the case of the Traffic status and incident map, which is constantly and automatically updated with the information entered 24 hours a day by the Civil Guard Traffic Group and the heads of the Traffic Management Centres of the DGT, the Generalitat de Catalunya and the Basque Government. Includes information on roads affected by weather phenomena (e.g. ice, floods, etc.) and forecast developments.

In addition, the DGT also uses its data to carry out studies that provide information on certain aspects related to mobility and road safety, which are very useful for decision-making and policy-making. One example is this study which analyses the behaviour and activity of certain groups in traffic accidents in order to propose proactive measures. Another example: this project to implement a computer system that identifies, through geolocation, the critical points with the highest accident rates on roads for their treatment and transfer of conclusions.

Use of data by third parties

The datasets provided by the DGT are also reused by other public administrations, researchers, entrepreneurs and private companies, fostering innovation in this field. Thanks to them, we find apps that allow users to access detailed information about vehicles in Spain (such as technical characteristics and inspection history and other data) or that provide information about the most dangerous places for cyclists.

In addition, its combination with advanced technologies such as artificial intelligence allows extracting even more value from the data, facilitating the identification of patterns and helping citizens and authorities to take preventive measures. One example is the Waze application, which has implemented an artificial intelligence-based functionality to identify and alert drivers to high-crashstretches of road, known as "black spots". This system combines historical accident data with analysis of road characteristics, such as gradients, traffic density and intersections, to provide accurate and highly useful alerts. The application notifies users in advance when they are approaching these dangerous areas, with the aim of reducing risks and improving road safety. This innovation complements the data provided by the DGT, helping to save lives by encouraging more cautious driving.

 For those who would like to get a taste and practice with DGT data, from datos.gob.es we have a step-by-step data science exercise. Users will be able to analyse these datasets and use predictive models to estimate the evolution of the electric vehicle in Spain. Documented code development and free-to-use tools are used for this purpose. All the material generated is available for reuse in the GitHub repository of datos.gob.es.

In short, the DGT's data sets offer great opportunities for reuse, even more so when combined with disruptive technologies. This is driving innovation, sustainability and safer, more connected transport, which is helping to transform urban and rural mobility.

calendar icon
Noticia

Once again this year, the European Commission organised the EU Open Data Day, one of the world's leading open data and innovation events. On 19-20 March, the European Convention Centre in Luxembourg brings together experts, government officials and academics to share knowledge, experience and progress on open data in Europe.

 During these two intense days, which also could be followed online explored crucial topics such as governance, quality, interoperability and the impact of artificial intelligence (AI) on open data. This event has become an essential forum for fostering the development of policies and practices that promote transparency and data-driven innovation across the European Union. In this post, we review each of the presentations at the event.

Openness and data history

To begin with, the Director General of the European Union Publications Office, Hilde Hardeman, opened the event by welcoming the attendees and setting the tone for the discussions to follow. Helena Korjonen and Emma Schymanski, two experts from the University of Luxembourg, then presented a retrospective entitled "A data journey: from darkness to enlightenment", exploring the evolution of data storage and sharing over 18,000 years. From cave paintings to modern servers, this historical journey highlighted how many of today's open data challenges, such as ownership, preservation and accessibility, have deep roots in human history.

This was followed by a presentation by Slava Jankin, Professor at the Centre for AI in Government at the University of Birmingham, on AI-driven digital twins and open data to create dynamic simulations of governance systems, which allow policymakers to test reforms and predict outcomes before implementing them.

Use cases between open data and AI

On the other hand, several use cases were also presented, such as Lithuania's practical experience in the comprehensive cataloguing of public data. Milda Aksamitauskas of the University of Wisconsin, addressed the governance challenges and communication strategies employed in the project and presented lessons on how other countries could adapt similar methods to improve transparency and data-driven decision-making.

In relation, scientific coordinator Bastiaan van Loenen presented the findings of the project he is working on, ODECO of Horizon 2020, focusing on the creation of sustainable open data ecosystems. As van Loenen explained, the research, which has been conducted over four years by 15 researchers, has explored user needs and governance structures for seven different groups, highlighting how circular, inclusive and skills-based approaches can provide economic and social value to open data ecosystems.

In addition, artificial intelligence was at the forefront throughout the event. Assistant Professor Anastasija Nikiforova from the University of Tartu offered a revealing insight into how artificial intelligence can transform government open data ecosystems. In his presentation, "Data for AI or AI for data" he explored eight different roles that AI can play. For example, AI can serve as a open data portal 'cleanser' and even retrieve data from the ecosystem, providing valuable insights for policymakers and researchers on how to effectively leverage AI in open data initiatives.

 Also using AI-powered tools, we find the EU Open Research Repository launched by Zenodo in 2024, an open science initiative that provides a tailored research repository for EU research funding recipients. Lars Holm Nielsen's presentation dhighlighted how AI-driven tools and high-quality open datasets reduce the cost and effort of data cleaning, while ensuring adherence to the FAIR principles.

The day continued with a speech by Maroš Šefčovič, European Commissioner for Trade and Economic Security, Inter-institutional Relations and Transparency, who underlined the European Commission's commitment to open data as a key pillar for transparency and innovation in the European Union.

Interoperability and data quality

After a break, Georges Lobo and Pavlina Fragkou, programme and project coordinator of SEMIC respectively, explained how the Semantic Interoperability Centre Europe (SEMIC) improves interoperable data exchange in Europe through theData Catalogue Vocabulary Application Profile (DCAT-AP) and Linked Data Event Streams (LDES). His presentation highlighted how these standards facilitate the efficient publication and consumption of data, with case studies such as the Rijksmuseum and the European Union Railway Agency demonstrating their value in fostering interoperable and sustainable data ecosystems.

Barbara Šlibar from the University of Zagreb then provided a detailed analysis of metadata quality in European open datasets, revealing significant disparities in five key dimensions. His study, based on random samples from data.europa.eu, underlined the importance of improving metadata practices and raising awareness among stakeholders to improve the usability and value of open data in Europe.

Then, Bianca Sammer, from Bavarian Agency for Digital Affairs shared her experience creating Germany's open data portal in just one year. His presentation "Unlocking the Potential" highlighted innovative solutions to overcome challenges in open data management. For example, they achieved an automated improvement of metadata quality, a reusable open source infrastructure and participation strategies for public administrations and users.

Open data today and on the horizon

The second day started with interventions by Rafał Rosiński, Undersecretary of State at the Ministry of Digital Affairs of Poland, who presented the Polish Presidency's perspective on open data and digital transformation, and Roberto Viola, Director General of the European Commission's Directorate-General for Communication Networks, Content and Technology, who spoke about the European path to digital innovation.

After the presentation of the day, the presentations on use cases and innovative proposals in open databegan. First, Stefaan Verhulst, co-founder of the New York governance lab GovLab, dubbed the historic moment we are living through as the "fourth wave of open data" characterised by the integration of generative artificial intelligence with open data to address social challenges.. His presentation raised crucial questions about how AI-based conversational interfaces can improve accessibility, what it means for open data to be "AI-ready" and how to build sustainable data-driven solutions that balance openness and trust.

Christos Ellinides, Director General for Translation at the European Commission, then highlighted the importance of language data for AI on the continent. With 25 years of data spanning multiple languages and the expertise to develop multilingual services based on artificial intelligence, the Commission is at the forefront in the field of linguistic data spaces and in the use of European high-performance computing infrastructures to exploit data and AI.

Open data re-use use cases

Reuse brings multiple benefits. Kjersti Steien, from the Norwegian digitisation agency, presented Norway's national data portal, data.norge.no, which employs an AI-powered search engine to improve data discoverability. Using Google Vertex, the engine allows users to find relevant datasets without needing to know the exact terms used by data providers, demonstrating how AI can improve data reuse and adapt to emerging language models.

Beyond Norway, use cases from other cities and countries were also discussed. Sam Hawkins, Ember's UK Data Programme Manager, underlined the importance of open energy data in advancing the clean energy transition and ensuring system flexibility.

Another case was presented by Marika Eik from the University of Estonia, which leverages urban data and cross-sector collaboration to improve sustainability and community impact. His session examined a city-level approach to sustainability metrics and CO2 footprint calculations, drawing on data from municipalities, real estate operators, research institutions and mobility analysts to provide replicable models for improving environmental responsibility.

Raphaël Kergueno of Transparency International EU explained how Integrity Watch EU leverages open data to improve transparency and accountability in the Union. This initiative re-uses datasets such as the EU Transparency Register and the European Commission's meeting records to increase public awareness of lobbying activities and improve legislative oversight, demonstrating the potential of open data to strengthen democratic governance.

Also, Kate Larkin of the European Marine Observatory, presented the European Marine Observation and Data Network, highlighting how pan-European marine data services, which adhere to the FAIR principles contribute to initiatives such as the European Green Pact, maritime spatial planning and the blue economy. His presentation showed practical use cases demonstrating the integration of marine data into wider data ecosystems such as the European Digital Twin Ocean.

Data visualisation and communication

In addition to use cases, the EU Open Data Days 2025 highlighted data visualisation as a mechanism to bring open data to the people. In this vein, Antonio Moneo, CEO of Tangible Data, explored how transforming complex datasets into physical sculptures fosters data literacy and community engagement.

On the other hand, Jan Willem Tulp, founder of TULP interactive, examined how visual design influences the perception of data. His session explored how design elements such as colour, scale and focus can shape narratives and potentially introduce bias, highlighting the responsibilities of data visualisers to maintain transparency while crafting compelling visual narratives.

Education and data literacy

Davide Taibi, researcher at the Italian National Research Council, shared experiences on the integration of data literacy and AI in educational pathways, based on EU-funded projects such as DATALIT, DEDALUS and SMERALD. These initiatives piloted digitally enhanced learning modules in higher education, secondary schools and vocational training in several EU Member States, focusing on competence-oriented approaches and IT-based learning systems.

Nadieh Bremer, founder of Visual Cinnamon, explored how creative approaches to data visualisation can reveal the intricate bonds between people, cultures and concepts. Examples included a family tree of 3,000 European royals, relationships in UNESCO's Intangible Cultural Heritage and cross-cultural constellations in the night sky, demonstrating how iterative design processes can uncover hidden patterns in complex networks.

Digital artist Andreas Refsgaard closed the presentations with a reflection on the intersection of generative AI, art and data science. Through artistic and engaging examples, he invited the audience to reflect on the vast potential and ethical dilemmas arising from the growing influence of digital technologies in our daily lives.

In summary, the EU Open Data Day 2025 has once again demonstrated the importance of these meetings in driving the evolution of the open data ecosystem in Europe. The discussions, presentations and case studies shared during these two days have highlighted not only the progress made, but also the remaining challenges and emerging opportunities. In a context where artificial intelligence, sustainability and citizen participation are transforming the way we use and value data, events like this one are essential to foster collaboration, share knowledge and develop strategies that maximise the social and economic value of open data. The continued engagement of European institutions, national governments, academia and civil society will be essential to build a more robust, accessible and impactful open data ecosystem that responds to the challenges of the 21st century and contributes to the well-being of all European citizens.

 You can return to the recordings of each lecture here.

calendar icon