On September 14th, the II National Open Data Meeting took place under the theme "Urgent Call to Action for the Environment" at the Pignatelli building, the headquarters of the Government of Zaragoza. The event, held in person in the Crown Room, allowed attendees to participate and exchange ideas in real-time.
The event continued the tradition started in 2022 in Barcelona, establishing itself as one of the main gatherings in Spain in the field of public sector data reuse. María Ángeles Rincón, Director-General of Electronic Administration and Corporate Applications of the Government of Aragon, inaugurated the event, emphasizing the importance of open data in terms of transparency, reuse, economic development, and social development. She highlighted that high-quality and neutral data available on open data portals are crucial for driving artificial intelligence and understanding our environmental surroundings.
The day continued with a presentation by María Jesús Fernández Ruiz, Head of the Technical Office of Open Government of the City of Zaragoza, titled "Why Implement Data Governance in Our Institutions?" In her presentation, she stressed the need to manage data as a strategic asset and a public good, integrating them into governance and management policies. She also emphasized the importance of interoperability and the reuse of large volumes of data to turn them into knowledge, as well as the formation of interdisciplinary teams for data management and analysis.
The event included three panel discussions with the participation of professionals, experts, and scientists related to the management, publication, and use of open data, focusing on environmental data.
The first panel discussion highlighted the value of open data for understanding the environment we live in. In this video, you can revisit the panel discussion moderated by Borja Carvajal of the Diputación de Castellón: II National Open Data Meeting, Zaragoza, September 14, 2023 (morning session).
Secondly, Magda Lorente from the Diputación de Barcelona moderated the discussion "Open Data, Algorithms, and Artificial Intelligence: How to Combat Environmental Disinformation?" This second panel featured professionals from data journalism, science, and the public sector who discussed the opportunities and challenges of disseminating environmental information through open data.

Conclusions from Challenges 1 and 2 on Open Data: Interadministrative Collaboration and Professional Competencies
After the second panel discussion, the conclusions of Challenges 1 and 2 on open data were presented, two lines of work defined at the I National Open Data Meeting held in 2022.
In last year's conference, several challenges were identified in the field of open data. The first of them (Challenge 1) involved promoting collaboration between administrations to facilitate the opening of data sets and generate valuable exchanges for both parties. To address this challenge, annual work was carried out to establish the appropriate lines of action.
You can download the document summarizing the conclusions of Challenge 1 here: https://opendata.aragon.es/documents/90029301/115623550/Reto_1_encuentro_datos_Reto_1.pptx
On the other hand, Challenge 2 aimed to identify the need to define professional roles, as well as essential knowledge and competencies that public employees who take on tasks related to data opening should have.
To address this second challenge, a working group of professionals with expertise in the sector was also established, all pursuing the same goal: to promote the dissemination of open data and thus improve public policies by involving citizens and businesses throughout the opening process.
To resolve the key issues raised, the group addressed two related lines of work:
- Defining competencies and basic knowledge in the field of open data for different public professional profiles involved in data opening and use.
- Identifying and compiling existing training materials and pathways to provide workers with a starting point.
Key Professional Competencies for Data Opening
To specify the set of actions and attitudes that a worker should have to complete their work with open data, it was considered necessary to identify the main profiles in the administration needed, as well as the specific needs of each position. In this regard, the working group has based its analysis on the following roles:
- Open Data Manager role: responsible for technical leadership in promoting open data policies, data policy definition, and data model activities.
- Technical role in data opening (IT profile): encourages execution activities more related to system management, data extraction processes, data cleaning, etc., among others.
- Functional role in data opening (service technician): carries out execution activities more related to selecting data to be published, quality, promotion of open data, visualization, data analytics, for example.
- Use of data by public workers: performs activities involving data use for decision-making, basic data analytics, among others. Analyzing the functions of each of these roles, the team has established the necessary competencies and knowledge for performing the functions defined in each of these roles.
You can download the document with conclusions about professional capabilities for data opening here: https://opendata.aragon.es/documents/90029301/115623550/reto+2_+trabajadores+p%C3%BAblicos+capacitados+para+el+uso+y+la+apertura+de+datos.docx
Training Materials and Pathways on Open Data
In line with the second line of work, the team of professionals has developed an inventory of online training resources in the field of open data, which can be accessed for free. This list includes courses and materials in Spanish, co-official languages, and English, covering topics such as open data, their processing, analysis, and application.
You can download the document listing training materials, the result of the work of Challenge 2's group, here: [https://opendata.aragon.es/datos/catalogo/dataset/listado-de-materiales-formativos-sobre-datos-abiertos-fruto-del-trabajo-del-grupo-del-reto-2
In conclusion, the working group considered that the progress made during this first year marks a solid start, which will serve as a basis for administrations to design training and development plans aimed at the different roles involved in data opening. This, in turn, will contribute to strengthening and improving data policies in these entities.
Furthermore, it was noted that the effort invested in these months to identify training resources will be key in facilitating the acquisition of essential knowledge by public workers. On the other hand, it has been highlighted that there is a large number of free and open training resources with a basic level of specialization. However, the need to develop more advanced materials to train the professionals that the administration needs today has been identified.
The third panel discussion, moderated by Vicente Rubio from the Diputación de Castellón, focused on public policies based on data to improve the living environment of its inhabitants.
At the end of the meeting, it was emphasized how important it is to continue working on and shaping different challenges related to the functions and services of open data portals and data opening processes. In the III National Open Data Meeting to be held next year in the Province of Castellón, progress in this area will be presented.
This free software application offers a map with all the trees in the city of Barcelona geolocated by GPS. The user can access in-depth information on the subject. For example, the program identifies the number of trees in each street, their condition and even the species.
The application's developer, Pedro López Cabanillas, has used datasets from Barcelona's open data portal (Open Data Barcelona) and states, in his blog, that it can be useful for botany students or "curious users". The Barcelona Trees application is now in its third beta version.
The program uses the framework Qt, C++ and QML languages, and can be built (using a suitable modern compiler) for the most common targets: Windows, macOS, Linux and Android operating systems.
Climate Modeling and Prediction: planning for a Sustainable Future
Climate models make it possible to predict how the climate will change in the future and, when properly trained, also help to identify potential impacts in specific regions. This enables governments and communities to take measures to adapt to rapidly changing conditions.
Increasingly, these models are fed by open datasets, and some climate models have even begun to be published freely and openly. In this line, we find the climate models published on the MIT Climate portal or the data and models published by NOAA Climate.gov. In this way, all kinds of institutions, scientists and even citizens can contribute to identifying possibilities for mitigating the effects of climate change.
Carbon emissions monitoring: carbon footprint tracking
Thanks to open data and some paid-for datasets, it is now possible to accurately track the carbon emissions of countries, cities and even companies on an ongoing basis. As exemplified by the International Energy Agency's (IEA) World Energy Outlook 2022 or the U.S. Environmental Protection Agency's Global Greenhouse Gas Emissions Data, these data are essential not only for measuring and analyzing emissions globally, but also for assessing progress towards emission reduction targets.
Adapting Agriculture: cultivating a resilient future
It is clear that climate change has a direct impact on agriculture and that this impact threatens a global food security that in itself is already a global challenge. Open data on weather patterns, rainfall and temperatures, land use and fertilizer and pesticide use, coupled with local data captured in the field, allow farmers to adapt their practices and evolve towards a model of precision agriculture. Choosing crops that are resilient to changing conditions, and managing inputs more efficiently thanks to this data, is crucial to ensure that agriculture remains sustainable and productive in the new scenarios.
Among other organizations, the Food and Agriculture Organization of the United Nations (FAO) highlights the importance of open data in climate-smart agriculture and publishes datasets on pesticide use, inorganic fertilizers, greenhouse gas emissions, agricultural production, etc., which contribute to improved land, water and food security management.
Natural Disaster Response: minimizing Impact
The analysis of data on extreme weather events, such as hurricanes or floods, makes it possible to design strategies that lead to a faster and more effective response when these events occur. In this way, on the one hand, lives are saved and, on the other, the high impact on affected communities is partially mitigated.
Open data such as those published by the US National Hurricane Center (NHC) or the European Environment Agency are valuable tools in natural disaster management as they help streamline disaster preparedness decision-making and provide an objective basis for assessment and prioritization.
Biodiversity and conservation: protecting our natural wealth
While it seems clear that biodiversity is vital to the health of the Earth, human activity continues to put it under great pressure, combining with climate change to threaten its stability. Open data on species populations, deforestation and other ecological indicators such as those published by governments and organizations around the world in the Global Biodiversity Information Facility (GBIF) help us to identify areas at risk more quickly and accurately and thus prioritize conservation efforts.
With the increased availability of open data, governments, institutions, companies and citizens can make informed decisions to mitigate the consequences of climate change and work together towards a more sustainable future.
Content prepared by Jose Luis Marín, Senior Consultant in Data, Strategy, Innovation & Digitalization.
The contents and points of view reflected in this publication are the sole responsibility of its author.
The "Stories of Use Cases" series, organized by the European Open Data portal (data.europe.eu), is a collection of online events focused on the use of open data to contribute to common European Union objectives such as consolidating democracy, boosting the economy, combating climate change, and driving digital transformation. The series comprises four events, and all recordings are available on the European Open Data portal's YouTube channel. The presentations used to showcase each case are also published.
In a previous post on datos.gob.es, we explained the applications presented in two of the series' events, specifically those related to the economy and democracy. Now, we focus on use cases related to climate and technology, as well as the open datasets used for their development.
Open data has enabled the development of applications offering diverse information and services. In terms of climate, some examples can trace waste management processes or visualize relevant data about organic agriculture. Meanwhile, the application of open data in the technological sphere facilitates process management. Discover the highlighted examples by the European Open Data portal!
Open Data for Fulfilling the European Green Deal
The European Green Deal is a strategy by the European Commission aiming to achieve climate neutrality in Europe by 2050 and promote sustainable economic growth. To reach this objective, the European Commission is working on various actions, including reducing greenhouse gas emissions, transitioning to a circular economy, and improving energy efficiency. Under this common goal and utilizing open datasets, three applications have been developed and presented in one of the webinars of the series on data.europe.eu use cases: Eviron Mate, Geofluxus, and MyBioEuBuddy.
- Eviron Mate: It's an educational project aimed at raising awareness among young people about climate change and related data. To achieve this goal, Eviron Mate utilizes open data from Eurostat, the Copernicus Program and data.europa.eu.
- Geofluxus: This initiative tracks waste from its origin to its final destination to promote material reuse and reduce waste volume. Its main objective is to extend material lifespan and provide businesses with tools for better waste management decisions. Geofluxus uses open data from Eurostat and various national open data portals.
- MyBioEuBuddy is a project offering information and visualizations about sustainable agriculture in Europe, using open data from Eurostat and various regional open data portals.
The Role of Open Data in Digital Transformation
In addition to contributing to the fight against climate change by monitoring environment-related processes, open data can yield interesting outcomes in other digitally-operating domains. The combination of open data with innovative technologies provides valuable results, such as natural language processing, artificial intelligence, or augmented reality.
Another online seminar from the series, presented by the European Data Portal, delved into this theme: driving digital transformation in Europe through open data. During the event, three applications that combine cutting-edge technology and open data were presented: Big Data Test Infrastructure, Lobium, and 100 Europeans.
- "Big Data Test Infrastructure (BDTI)": This is a European Commission tool featuring a cloud platform to facilitate the analysis of open data for public sector administrations, offering a free and ready-to-use solution. BDTI provides open-source tools that promote the reuse of public sector data. Any public administration can request the free advisory service by filling out a form. BDTI has already aided some public sector entities in optimizing procurement processes, obtaining mobility information for service redesign, and assisting doctors in extracting knowledge from articles.
- Lobium: A website assisting public affairs managers in addressing the complexities of their tasks. Its aim is to provide tools for campaign management, internal reporting, KPI measurement, and government affairs dashboards. Ultimately, its solution leverages digital tools' advantages to enhance and optimize public management.
- 100 Europeans: An application that simplifies European statistics, dividing the European population into 100 individuals. Through scrolling navigation, it presents data visualizations with figures related to healthy habits and consumption in Europe.
These six applications are examples of how open data can be used to develop solutions of societal interest. Discover more use cases created with open data in this article we have published on datos.gob.es

Learn more about these applications in their seminars -> Recordings here
Public administrations (PAs) have the obligation to publish their open datasets in reusable formats, as dictated by European Directive 2019/1024 which amends Law 37/2007 of November 16, regarding the reuse of public sector information. This regulation, aligned with the European Union's Data Strategy, stipulates that PAs must have their own catalogs of open data to promote the use and reuse of public information.
One of these catalogs is the Canary Islands Open Data Portal, which contains over 7,450 open, free, and reusable datasets from up to 15 organizations within the autonomous community. The Ministry of Agriculture, Livestock, Fisheries, and Food Sovereignty (CAGPSA) of the Government of the Canary Islands is part of this list. As part of its Open Government initiative, CAGPSA has strongly promoted the opening of its data.
Through a process of analysis, refinement, and normalization of the data, CAGPSA has successfully published over 20 datasets on the portal, thus ensuring the quality of information reuse by any interested party.
Analysis, data normalization, and data opening protocol for the Government of the Canary Islands
To achieve this milestone in data management, the Ministry of Agriculture, Livestock, Fisheries, and Food Sovereignty of the Government of the Canary Islands has developed and implemented a data opening protocol, which includes tasks such as:
- Inventory creation and prioritization of data sources for publication.
- Analysis, refinement, and normalization of prioritized datasets.
- Requesting the upload of datasets to the Canary Islands Open Data Portal.
- Addressing requests related to the published datasets.
- Updating published datasets.
Data normalization has been a key factor for the Ministry, taking into account international semantic assets (including United Nations classifications and various agencies or Eurostat) and applying guidelines defined in international standards such as SDMX or those set by datos.gob.es, to ensure the quality of the published data.
CAGPSA has not only put efforts into data normalization and publication but has also provided support to the ministry's personnel in the management and maintenance of the data, offering training and awareness sessions. Furthermore, they have created a manual for data reuse, outlining guidelines based on European and national directives regarding open data and the reuse of public sector information. This manual helps address concerns of the ministry's staff regarding the publication of personal or commercial data.
As a result of this work, the Ministry has actively collaborated with the Canary Islands Open Data Portal in publishing datasets and defining the data opening protocol established for the entire Government of the Canary Islands.
Commitment to Quality and Information Reuse
CAGPSA has been particularly recognized for the publication of the Agricultural Transformation Societies (SAT) dataset, which ranked among the top 3 datasets by the Multisectorial Information Association (ASEDIE) in 2021. This initiative has been praised by the association on multiple occasions for its focus on data quality and management.
Their efforts in data normalization, support to the ministry's staff, collaboration with the open data portal, and the extensive array of datasets, position CAGPSA as a reference in this field within the Canary Islands autonomous community.
At datos.gob.es, we applaud these kinds of examples and highlight the good practices in data opening by public administrations. The initiative of the Ministry of Agriculture, Livestock, Fisheries, and Food Sovereignty of the Government of the Canary Islands is a significant step that brings us closer to the advantages that open data and its reuse offer to the citizens. The Ministry's commitment to data openness contributes to the European and national goal of achieving a data-driven administration.
Open data is a valuable tool for making informed decisions that encourage the success of a process and enhance its effectiveness. From a sectorial perspective, open data provides relevant information about the legal, educational, or health sectors. All of these, along with many other areas, utilize open sources to measure improvement compliance or develop tools that streamline work for professionals.
The benefits of using open data are extensive, and their variety goes hand in hand with technological innovation: every day, more opportunities arise to employ open data in the development of innovative solutions. An example of this can be seen in urban development aligned with the sustainability values advocated by the United Nations (UN).
Cities cover only 3% of the Earth's surface; however, they emit 70% of carbon emissions and consume over 60% of the world's resources, according to the UN. In 2023, more than half of the global population lives in cities, and this figure is projected to keep growing. By 2030, it is estimated that over 5 billion people would live in cities, meaning more than 60% of the world's population.
Despite this trend, infrastructures and neighborhoods do not meet the appropriate conditions for sustainable development, and the goal is to "Make cities and human settlements inclusive, safe, resilient, and sustainable," as recognized in Sustainable Development Goal (SDG) number 11. Proper planning and management of urban resources are significant factors in creating and maintaining sustainability-based communities. In this context, open data plays a crucial role in measuring compliance with this SDG and thus achieving the goal of sustainable cities.
In conclusion, open data stands as a fundamental tool for the strengthening and progress of sustainable city development.
In this infographic, we have gathered use cases that utilize sets of open data to monitor and/or enhance energy efficiency, transportation and urban mobility, air quality, and noise levels. Issues that contribute to the proper functioning of urban centers.
Click on the infographic to view it in full size.
The humanitarian crisis following the earthquake in Haiti in 2010 was the starting point for a voluntary initiative to create maps to identify the level of damage and vulnerability by areas, and thus to coordinate emergency teams. Since then, the collaborative mapping project known as Hot OSM (OpenStreetMap) has played a key role in crisis situations and natural disasters.
Now, the organisation has evolved into a global network of volunteers who contribute their online mapping skills to help in crisis situations around the world. The initiative is an example of data-driven collaboration to solve societal problems, a theme we explore in this data.gob.es report.
Hot OSM works to accelerate data-driven collaboration with humanitarian and governmental organisations, as well as local communities and volunteers around the world, to provide accurate and detailed maps of areas affected by natural disasters or humanitarian crises. These maps are used to help coordinate emergency response, identify needs and plan for recovery.
In its work, Hot OSM prioritises collaboration and empowerment of local communities. The organisation works to ensure that people living in affected areas have a voice and power in the mapping process. This means that Hot OSM works closely with local communities to ensure that areas important to them are mapped. In this way, the needs of communities are considered when planning emergency response and recovery.
Hot OSM's educational work
In addition to its work in crisis situations, Hot OSM is dedicated to promoting access to free and open geospatial data, and works in collaboration with other organisations to build tools and technologies that enable communities around the world to harness the power of collaborative mapping.
Through its online platform, Hot OSM provides free access to a wide range of tools and resources to help volunteers learn and participate in collaborative mapping. The organisation also offers training for those interested in contributing to its work.
One example of a HOT project is the work the organisation carried out in the context of Ebola in West Africa. In 2014, an Ebola outbreak affected several West African countries, including Sierra Leone, Liberia and Guinea. The lack of accurate and detailed maps in these areas made it difficult to coordinate the emergency response.
In response to this need, HOT initiated a collaborative mapping project involving more than 3,000 volunteers worldwide. Volunteers used online tools to map Ebola-affected areas, including roads, villages and treatment centres.
This mapping allowed humanitarian workers to better coordinate the emergency response, identify high-risk areas and prioritize resource allocation. In addition, the project also helped local communities to better understand the situation and participate in the emergency response.
This case in West Africa is just one example of HOT's work around the world to assist in humanitarian crisis situations. The organisation has worked in a variety of contexts, including earthquakes, floods and armed conflict, and has helped provide accurate and detailed maps for emergency response in each of these contexts.
On the other hand, the platform is also involved in areas where there is no map coverage, such as in many African countries. In these areas, humanitarian aid projects are often very challenging in the early stages, as it is very difficult to quantify what population is living in an area and where they are located. Having the location of these people and showing access routes "puts them on the map" and allows them to gain access to resources.
In this article The evolution of humanitarian mapping within the OpenStreetMap community by Nature, we can see graphically some of the achievements of the platform.

How to collaborate
It is easy to start collaborating with Hot OSM, just go to https://tasks.hotosm.org/explore and see the open projects that need collaboration.
This screen allows us a lot of options when searching for projects, selected by level of difficulty, organisation, location or interests among others.
To participate, simply click on the Register button.

Give a name and an e-mail adress on the next screen:

It will ask us if we have already created an account in Open Street Maps or if we want to create one.
If we want to see the process in more detail, this website makes it very easy.
Once the user has been created, on the learning page we find help on how to participate in the project.
It is important to note that the contributions of the volunteers are reviewed and validated and there is a second level of volunteers, the validators, who validate the work of the beginners. During the development of the tool, the HOT team has taken great care to make it a user-friendly application so as not to limit its use to people with computer skills.
In addition, organisations such as the Red Cross and the United Nations regularly organise mapathons to bring together groups of people for specific projects or to teach new volunteers how to use the tool. These meetings serve, above all, to remove the new users' fear of "breaking something" and to allow them to see how their voluntary work serves concrete purposes and helps other people.
Another of the project's great strengths is that it is based on free software and allows for its reuse. In the MissingMaps project's Github repository we can find the code and if we want to create a community based on the software, the Missing Maps organisation facilitates the process and gives visibility to our group.
In short, Hot OSM is a citizen science and data altruism project that contributes to bringing benefits to society through the development of collaborative maps that are very useful in emergency situations. This type of initiative is aligned with the European concept of data governance that seeks to encourage altruism to voluntarily facilitate the use of data for the common good.
Content by Santiago Mota, senior data scientist.
The contents and views reflected in this publication are the sole responsibility of the author.
1. Introduction
Visualizations are graphical representations of data that allow the information linked to them to be communicated in a simple and effective way. The visualization possibilities are very wide, from basic representations, such as a line chart, bars or sectors, to visualizations configured on dashboards or interactive dashboards.
In this "Step-by-Step Visualizations" section we are regularly presenting practical exercises of open data visualizations available on datos.gob.es or similar catalogs. They address and describe in a simple way the stages necessary to obtain the data, perform the transformations and analysis that are relevant to and finally, the creation of interactive visualizations; from which we can extract information summarized in final conclusions. In each of these practical exercises, simple and well-documented code developments are used, as well as free to use tools. All generated material is available for reuse in GitHub's Data Lab repository.
Run the data pre-processing code on top of Google Colab.
Below, you can access the material that we will use in the exercise and that we will explain and develop in the following sections of this post.
Access the data lab repository on Github.
Run the data pre-processing code on top of Google Colab.
2. Objective
The main objective of this exercise is to make an analysis of the meteorological data collected in several stations during the last years. To perform this analysis, we will use different visualizations generated by the "ggplot2" library of the programming language "R".
Of all the Spanish weather stations, we have decided to analyze two of them, one in the coldest province of the country (Burgos) and another in the warmest province of the country (Córdoba), according to data from the AEMET. Patterns and trends in the different records between 1990 and 2020 will be sought to understand the meteorological evolution suffered in this period of time.
Once the data has been analyzed, we can answer questions such as those shown below:
- What is the trend in the evolution of temperatures in recent years?
- What is the trend in the evolution of rainfall in recent years?
- Which weather station (Burgos or Córdoba) presents a greater variation of climatological data in recent years?
- What degree of correlation is there between the different climatological variables recorded?
These, and many other questions can be solved by using tools such as ggplot2 that facilitate the interpretation of data through interactive visualizations.
3. Resources
3.1. Datasets
The datasets contain different meteorological information of interest for the two stations in question broken down by year. Within the AEMET download center, we can download them, upon request of the API key, in the section "monthly / annual climatologies". From the existing weather stations, we have selected two of which we will obtain the data: Burgos airport (2331) and Córdoba airport (5402)
It should be noted that, along with the datasets, we can also download their metadata, which are of special importance when identifying the different variables registered in the datasets.
These datasets are also available in the Github repository.
3.2. Tools
To carry out the data preprocessing tasks, the R programming language written on a Jupyter Notebook hosted in the Google Colab cloud service has been used.
"Google Colab" or, also called Google Colaboratory, is a cloud service from Google Research that allows you to program, execute and share code written in Python or R on a Jupyter Notebook from your browser, so it does not require configuration. This service is free of charge.
For the creation of the visualizations, the ggplot2 library has been used.
"ggplot2" is a data visualization package for the R programming language. It focuses on the construction of graphics from layers of aesthetic, geometric and statistical elements. ggplot2 offers a wide range of high-quality statistical charts, including bar charts, line charts, scatter plots, box and whisker charts, and many others.
If you want to know more about tools that can help you in the treatment and visualization of data, you can use the report "Data processing and visualization tools".
4. Data processing or preparation
The processes that we describe below you will find them commented in the Notebook that you can also run from Google Colab.
Before embarking on building an effective visualization, we must carry out a prior treatment of the data, paying special attention to obtaining them and validating their content, ensuring that they are in the appropriate and consistent format for processing and that they do not contain errors.
As a first step of the process, once the necessary libraries have been imported and the datasets loaded, it is necessary to perform an exploratory analysis of the data (EDA) in order to properly interpret the starting data, detect anomalies, missing data or errors that could affect the quality of the subsequent processes and results. If you want to know more about this process, you can resort to the Practical Guide of Introduction to Exploratory Data Analysis.
The next step is to generate the preprocessed data tables that we will use in the visualizations. To do this, we will filter the initial data sets and calculate the values that are necessary and of interest for the analysis carried out in this exercise.
Once the preprocessing is finished, we will obtain the data tables "datos_graficas_C" and "datos_graficas_B" which we will use in the next section of the Notebook to generate the visualizations.
The structure of the Notebook in which the steps previously described are carried out together with explanatory comments of each of them, is as follows:
- Installation and loading of libraries.
- Loading datasets
- Exploratory Data Analysis (EDA)
- Preparing the data tables
- Views
- Saving graphics
You will be able to reproduce this analysis, as the source code is available in our GitHub account. The way to provide the code is through a document made on a Jupyter Notebook that once loaded into the development environment you can run or modify easily. Due to the informative nature of this post and in order to favor the understanding of non-specialized readers, the code is not intended to be the most efficient but to facilitate its understanding so you will possibly come up with many ways to optimize the proposed code to achieve similar purposes. We encourage you to do so!
5. Visualizations
Various types of visualizations and graphs have been made to extract information on the tables of preprocessed data and answer the initial questions posed in this exercise. As mentioned previously, the R "ggplot2" package has been used to perform the visualizations.
The "ggplot2" package is a data visualization library in the R programming language. It was developed by Hadley Wickham and is part of the "tidyverse" package toolkit. The "ggplot2" package is built around the concept of "graph grammar", which is a theoretical framework for building graphs by combining basic elements of data visualization such as layers, scales, legends, annotations, and themes. This allows you to create complex, custom data visualizations with cleaner, more structured code.
If you want to have a summary view of the possibilities of visualizations with ggplot2, see the following "cheatsheet". You can also get more detailed information in the following "user manual".
5.1. Line charts
Line charts are a graphical representation of data that uses points connected by lines to show the evolution of a variable in a continuous dimension, such as time. The values of the variable are represented on the vertical axis and the continuous dimension on the horizontal axis. Line charts are useful for visualizing trends, comparing evolutions, and detecting patterns.
Next, we can visualize several line graphs with the temporal evolution of the values of average, minimum and maximum temperatures of the two meteorological stations analyzed (Córdoba and Burgos). On these graphs, we have introduced trend lines to be able to observe their evolution in a visual and simple way.
To compare the evolutions, not only visually through the graphed trend lines, but also numerically, we obtain the slope coefficients of the trend line, that is, the change in the response variable (tm_ month, tm_min, tm_max) for each unit of change in the predictor variable (year).
- Average temperature slope coefficient Córdoba: 0.036
- Average temperature slope coefficient Burgos: 0.025
- Coefficient of slope minimum temperature Córdoba: 0.020
- Coefficient of slope minimum temperature Burgos: 0.020
- Slope coefficient maximum temperature Córdoba: 0.051
- Slope coefficient maximum temperature Burgos: 0.030
We can interpret that the higher this value, the more abrupt the average temperature rise in each observed period.
Finally, we have created a line graph for each weather station, in which we jointly visualize the evolution of average, minimum and maximum temperatures over the years.
The main conclusions obtained from the visualizations of this section are:
- The average, minimum and maximum annual temperatures recorded in Córdoba and Burgos have an increasing trend.
- The most significant increase is observed in the evolution of the maximum temperatures of Córdoba (slope coefficient = 0.051)
- The slightest increase is observed in the evolution of the minimum temperatures, both in Córdoba and Burgos (slope coefficient = 0.020)
5.2. Bar charts
Bar charts are a graphical representation of data that uses rectangular bars to show the magnitude of a variable in different categories or groups. The height or length of the bars represents the amount or frequency of the variable, and the categories are represented on the horizontal axis. Bar charts are useful for comparing the magnitude of different categories and for visualizing differences between them.
We have generated two bar graphs with the data corresponding to the total accumulated precipitation per year for the different weather stations.
As in the previous section, we plot the trend line and calculate the slope coefficient.
- Slope coefficient for accumulated rainfall Córdoba: -2.97
- Slope coefficient for accumulated rainfall Burgos: -0.36
The main conclusions obtained from the visualizations of this section are:
- The annual accumulated rainfall has a decreasing trend for both Córdoba and Burgos.
- The downward trend is greater for Córdoba (coefficient = -2.97), being more moderate for Burgos (coefficient = -0.36)
5.3. Histograms
Histograms are a graphical representation of a frequency distribution of numeric data in a range of values. The horizontal axis represents the values of the data divided into intervals, called "bin", and the vertical axis represents the frequency or amount of data found in each "bin". Histograms are useful for identifying patterns in data, such as distribution, dispersion, symmetry, or bias.
We have generated two histograms with the distributions of the data corresponding to the total accumulated precipitation per year for the different meteorological stations, being the chosen intervals of 50 mm3.
The main conclusions obtained from the visualizations of this section are:
- The records of annual accumulated precipitation in Burgos present a distribution close to a normal and symmetrical distribution.
- The records of annual accumulated precipitation in Córdoba do not present a symmetrical distribution.
5.4. Box and whisker diagrams
Box and whisker diagrams are a graphical representation of the distribution of a set of numerical data. These graphs represent the median, interquartile range, and minimum and maximum values of the data. The chart box represents the interquartile range, that is, the range between the first and third quartiles of the data. Out-of-the-box points, called outliers, can indicate extreme values or anomalous data. Box plots are useful for comparing distributions and detecting extreme values in your data.
We have generated a graph with the box diagrams corresponding to the accumulated rainfall data from the weather stations.
To understand the graph, the following points should be highlighted:
- The boundaries of the box indicate the first and third quartiles (Q1 and Q3), which leave below each, 25% and 75% of the data respectively.
- The horizontal line inside the box is the median (equivalent to the second quartile Q2), which leaves half of the data below.
- The whisker limits are the extreme values, that is, the minimum value and the maximum value of the data series.
- The points outside the whiskers are the outliers.
The main conclusions obtained from the visualization of this section are:
- Both distributions present 3 extreme values, being significant those of Córdoba with values greater than 1000 mm3.
- The records of Córdoba have a greater variability than those of Burgos, which are more stable
5.5. Pie charts
A pie chart is a type of pie chart that represents proportions or percentages of a whole. It consists of several sections or sectors, where each sector represents a proportion of the whole set. The size of the sector is determined based on the proportion it represents, and is expressed in the form of an angle or percentage. It is a useful tool for visualizing the relative distribution of the different parts of a set and facilitates the visual comparison of the proportions between the different groups.
We have generated two graphs of (polar) sectors. The first of them with the number of days that the values exceed 30º in Córdoba and the second of them with the number of days that the values fall below 0º in Burgos.
For the realization of these graphs, we have grouped the sum of the number of days described above into six groups, corresponding to periods of 5 years from 1990 to 2020.
The main conclusions obtained from the visualizations of this section are:
- There is an increase of 31.9% in the total number of annual days with temperatures above 30º in Córdoba for the period between 2015-2020 compared to the period 1990-1995.
- There is an increase of 33.5% in the total number of annual days with temperatures above 30º in Burgos for the period between 2015-2020 compared to the period 1990-1995.
5.6. Scatter plots
Scatter plots are a data visualization tool that represent the relationship between two numerical variables by locating points on a Cartesian plane. Each dot represents a pair of values of the two variables and its position on the graph indicates how they relate to each other. Scatter plots are commonly used to identify patterns and trends in data, as well as to detect any possible correlation between variables. These charts can also help identify outliers or data that doesn't fit the overall trend.
We have generated two scattering plots in which the values of maximum average temperatures and minimum averages are compared, looking for correlation trends between them for the values of each weather station.
To analyze the correlations, not only visually through graphs, but also numerically, we obtain Pearson's correlation coefficients. This coefficient is a statistical measure that indicates the degree of linear association between two quantitative variables. It is used to assess whether there is a positive linear relationship (both variables increase or decrease simultaneously at a constant rate), negative (the values of both variables vary oppositely) or null (no relationship) between two variables and the strength of such a relationship, the closer to +1, the higher their association.
- Pearson coefficient (Average temperature max VS min) Córdoba: 0.15
- Pearson coefficient (Average temperature max VS min) Burgos: 0.61
In the image we observe that while in Córdoba a greater dispersion is appreciated, in Burgos a greater correlation is observed.
Next, we will modify the previous scatter plots so that they provide us with more information visually. To do this, we divide the space by colored sectors (red with higher temperature values / blue lower temperature values) and show in the different bubbles the label with the corresponding year. It should be noted that the color change limits of the quadrants correspond to the average values of each of the variables.
The main conclusions obtained from the visualizations of this section are:
- There is a positive linear relationship between the average maximum and minimum temperature in both Córdoba and Burgos, this correlation being greater in the Burgos data.
- The years with the highest values of maximum and minimum temperatures in Burgos are (2003, 2006 and 2020)
- The years with the highest values of maximum and minimum temperatures in Córdoba are (1995, 2006 and 2020)
5.7. Correlation matrix
The correlation matrix is a table that shows the correlations between all variables in a dataset. It is a square matrix that shows the correlation between each pair of variables on a scale ranging from -1 to 1. A value of -1 indicates a perfect negative correlation, a value of 0 indicates no correlation, and a value of 1 indicates a perfect positive correlation. The correlation matrix is commonly used to identify patterns and relationships between variables in a dataset, which can help to better understand the factors that influence a phenomenon or outcome.
We have generated two heat maps with the correlation matrix data for both weather stations.
The main conclusions obtained from the visualizations of this section are:
- There is a strong negative correlation (-0.42) for Córdoba and (-0.45) for Burgos between the number of annual days with temperatures above 30º and accumulated rainfall. This means that as the number of days with temperatures above 30º increases, precipitation decreases significantly.
6. Conclusions of the exercise
Data visualization is one of the most powerful mechanisms for exploiting and analyzing the implicit meaning of data. As we have seen in this exercise, "ggplot2" is a powerful library capable of representing a wide variety of graphics with a high degree of customization that allows you to adjust numerous characteristics of each graph.
After analyzing the previous visualizations, we can conclude that both for the weather station of Burgos, as well as that of Córdoba, temperatures (minimum, average, maximum) have suffered a considerable increase, days with extreme heat (temperature > 30º) have also suffered and rainfall has decreased in the period of time analyzed, from 1990 to 2020.
We hope that this step-by-step visualization has been useful for learning some very common techniques in the treatment, representation and interpretation of open data. We will be back to show you new reuses. See you soon!
Updated: 21/03/2024
On January 2023, the European Commission published a list of high-value datasets that public sector bodies must make available to the public within a maximum of 16 months. The main objective of establishing the list of high-value datasets was to ensure that public data with the highest socio-economic potential are made available for re-use with minimal legal and technical restriction, and at no cost. Among these public sector datasets, some, such as meteorological or air quality data, are particularly interesting for developers and creators of services such as apps or websites, which bring added value and important benefits for society, the environment or the economy.
The publication of the Regulation has been accompanied by frequently asked questions to help public bodies understand the benefit of HVDS (High Value Datasets) for society and the economy, as well as to explain some aspects of the obligatory nature of HVDS (High Value Datasets) and the support for publication.
In line with this proposal, Executive Vice-President for a Digitally Ready Europe, Margrethe Vestager, stated the following in the press release issued by the European Commission:
"Making high-value datasets available to the public will benefit both the economy and society, for example by helping to combat climate change, reducing urban air pollution and improving transport infrastructure. This is a practical step towards the success of the Digital Decade and building a more prosperous digital future".
In parallel, Internal Market Commissioner Thierry Breton also added the following words on the announcement of the list of high-value data: "Data is a cornerstone of our industrial competitiveness in the EU. With the new list of high-value datasets we are unlocking a wealth of public data for the benefit of all”. Start-ups and SMEs will be able to use this to develop new innovative products and solutions to improve the lives of citizens in the EU and around the world.
Six categories to bring together new high-value datasets
The regulation is thus created under the umbrella of the European Open Data Directive, which defines six categories to differentiate the new high-value datasets requested:
- Geospatial
- Earth observation and environmental
- Meteorological
- Statistical
- Business
- Mobility
However, as stated in the European Commission's press release, this thematic range could be extended at a later stage depending on technological and market developments. Thus, the datasets will be available in machine-readable format, via an application programming interface (API) and, if relevant, also with a bulk download option.
In addition, the reuse of datasets such as mobility or building geolocation data can expand the business opportunities available for sectors such as logistics or transport. In parallel, weather observation, radar, air quality or soil pollution data can also support research and digital innovation, as well as policy making in the fight against climate change.
Ultimately, greater availability of data, especially high-value data, has the potential to boost entrepreneurship as these datasets can be an important resource for SMEs to develop new digital products and services, which in turn can also attract new investors.
Find out more in this infographic:


Access the accessible version on two pages.
Digital Earth Solutions is a technology company whose aim is to contribute to the conservation of marine ecosystems through innovative ocean modelling solutions.
Based on more than 20 years of CSIC studies in ocean dynamics, Digital Solutions has developed a unique software capable of predicting in a few minutes and with high precision the geographical evolution of any spill or floating body (plastics, people, algae...), forecasting its trajectory in the sea for the following days or its origin by analysing its movement back in time.
Thanks to this technology, it is possible to minimise the impact of oil and other waste spills on coasts, seas and oceans.
