Medio ambiente | datos.gob.es

The benefits of open data in the agriculture and forestry sector: the case of Fruktia and Arbaria

Blog

As in other industries, digital transformation is helping to change the way the agriculture and forestry sector operates. Combining technologies such as geolocation or artificial intelligence and using open datasets to develop new precision tools is transforming agriculture into an increasingly technological and analytical activity.

Along these lines, the administrations are also making progress to improve management and decision-making in the face of the challenges we are facing. Thus, the Ministry of Agriculture, Fisheries and Food and the Ministry for Ecological Transition and the Demographic Challenge have designed two digital tools that use open data: Fruktia (crop forecasting related to fruit trees) and Arbaria (fire management), respectively.

Predicting harvests to better manage crises

Fruktia is a predictive tool developed by the Ministry of Agriculture to foresee oversupply situations in the stone fruit and citrus fruit sector before the traditional systems of knowledge of forecasts or gauges. After the price crises suffered in 2017 in stone fruit and in 2019 in citrus fruit due to a supervening oversupply, it became clear that decision-making to manage these crises based on traditional forecasting systems came too late and that it was necessary to anticipate in order to adopt more effective measures by the administration and even by the sector itself that would prevent prices from falling.

In response to this critical situation, the Ministry of Agriculture decided to develop a tool capable of predicting harvests based on weather and production data from previous years. This tool would be used internally by the Ministry and its analysis would be seen at the working tables with the sector, but would not be public under any circumstances, thus avoiding its possible influence on the markets in a way that could not be controlled.

Fruktia exists thanks to the fact that the Ministry has managed to combine information from two main sources: open data and the knowledge of sector experts. These data sources are collected by Artificial Intelligence which, using Machine Learning and Deep Learning technology, analyses the information to make specific forecasts.

The open datasets used come from:

Information from weather stations of the Spanish Meteorological Agency (AEMET).
Information from agro-climatic stations.

With the above data and statistical data from crop estimates of past campaigns (Production Advances and Yearbooks of the Ministry of Agriculture, Fisheries and Food) together with sector-specific information, Fruktia makes two types of crop predictions: at regional level (province model) and at farm level (enclosure model).

The provincial model is used to make predictions at provincial level (as its name suggests) and to analyse the results of previous harvests in order to:

Anticipate excess production.
Anticipate crises in the sector, improving decision-making to manage them.
Study the evolution of each product by province.

This model, although already developed, continues to be improved to achieve the best adaptation to reality regardless of the weather conditions.

On the other hand, the model of enclosures (still under development) aims to:

Production forecasts with a greater level of detail and for more products (for example, it will be possible to know production forecasts for stone fruit crops such as paraguayo or platerina for which we currently do not have information from statistical sources yet).
Knowing how crops are affected by specific weather phenomena in different regions.

The model of enclosures is still being designed, and when it is fully operational it will also contribute to:

Improve marketing planning.
Anticipate excess production at a more local level or for a specific type of product.
Predict crises before they occur in order to anticipate their effects and avoid a situation of falling prices.
Locate areas or precincts with problems in specific campaigns.

In other words, the ultimate aim of Fruktia is to achieve the simulation of different types of scenarios that serve to anticipate the problems of each harvest long before they occur in order to adopt the appropriate decisions from the administrations.

Arbaria: data science to prevent forest fires

A year before the birth of Fruktia, in 2019, the Ministry of Agriculture, Fisheries and Food designed a digital tool for the prediction of forest fires which, in turn, is coordinated from the forestry point of view by the Ministry for Ecological Transition and the Demographic Challenge.

Under the name of Arbaria, this initiative of the Executive seeks to analyse and predict the risk of fires occurring in specific temporal and territorial areas of the Spanish territory. In particular, thanks to the analysis of the data used, it is able to analyse the socio-economic influence on the occurrence of forest fires at the municipal level and anticipate the risk of fire in the summer season at the provincial level, thus improving access to the resources needed to tackle it.

The tool uses historical data from open information sources such as the AEMET or the INE, and the records of the General Forest Fire Statistics (EGIF). To do so, Artificial Intelligence techniques related to Deep and Machine Learning are used, as well as Amazon Web Services cloud technology.

However, the level of precision offered by a tool such as Arbaria is not only due to the technology with which it has been designed, but also to the quality of the open data selected.

Considering the demographic reality of each municipality as another variable to be taken into account is important when determining fire risk. In other words, knowing the number of companies based in a locality, the economic activity carried out there, the number of inhabitants registered or the number of agricultural or livestock farms present is relevant to be able to anticipate the risk and create preventive campaigns aimed at specific sectors.

In addition, the historical data on forest fires gathered in the General Forest Fire Statistics is one of the most complete in the world. There is a general register of fires since 1968 and another particularly exhaustive one from the 1990s to the present day, which includes data such as the location and characteristics of the surface of the fire, means used to extinguish it, extinguishing time, causes of the fire or damage to the area, among others.

Initiatives such as Fruktia or Arbaria serve to demonstrate the economic and social potential that can be extracted from open datasets. Being able to predict, for example, the amount of peaches that fruit trees in a municipality in Almeria will yield helps not only to plan job creation in an area, but also to ensure that sales and consumption in an area remain stable.

Likewise, being able to predict the risk of fires provides the tools for better fire prevention and extinction planning.

Content written by the datos.gob.es team

28/11/2022

First National Open Data Meeting to focus on the demographic challenge

Evento

The first National Open Data Meeting will take place in Barcelona on 21 November. It is an initiative promoted and co-organised by Barcelona Provincial Council, Government of Aragon and Provincial Council of Castellón, with the aim of identifying and developing specific proposals to promote the reuse of open data.

This first meeting will focus on the role of open data in developing territorial cohesion policies that contribute to overcoming the demographic challenge.

Agenda

The day will begin at 9:00 am and will last until 18:00 pm.

After the opening, which will be given by Marc Verdaguer, Deputy for Innovation, Local Governments and Territorial Cohesion of the Barcelona Provincial Council, there will be a keynote speech, where Carles Ramió, Vice-Rector for Institutional Planning and Evaluation at Pompeu Fabra University, will present the context of the subject.

Then, the event will be divided into four sessions where the following topics will be discussed:

10:30 a.m. State of art: lights and some shadows of opening and reusing data.
12:30 p.m. What does society need and expect from public administrations' open data portals?
15:00. Local commitment to fight against depopulation through open data
4:30 p.m. What can Public Administrations do using their data to jointly fight depopulation?

Experts linked to various open data initiatives, public organisations and business associations will participate in the conference. Specifically, the Aporta Initiative will participate in the first session, where the challenges and opportunities of the use of open data will be discussed.

The importance of addressing the demographic challenge

The conference will address how the ageing of the population, the geographical isolation that hinders access to health, administrative and educational centres and the loss of economic activity affect the smaller municipalities, both rural and urban. A situation with great repercussions on the sustainability and supply of the whole country, as well as on the preservation of culture and diversity.

26/10/2022

Analysis of the state and evolution of the national water reservoirs

Documentación

1. Introduction

Visualizations are graphical representations of data that allow the information linked to them to be communicated in a simple and effective way. The visualization possibilities are very broad, from basic representations such as line, bar or pie chart, to visualizations configured on control panels or interactive dashboards. Visualizations play a fundamental role in drawing conclusions from visual information, allowing detection of patterns, trends, anomalous data or projection of predictions, among many other functions.

Before starting to build an effective visualization, a prior data treatment must be performed, paying special attention to their collection and validation of their content, ensuring that they are in a proper and consistent format for processing and free of errors. The previous data treatment is essential to carry out any task related to data analysis and realization of effective visualizations.

In the section “Visualizations step-by-step” we are periodically presenting practical exercises on open data visualizations that are available in datos.gob.es catalogue and other similar catalogues. In there, we approach and describe in a simple way the necessary steps to obtain data, perform transformations and analysis that are relevant to creation of interactive visualizations from which we may extract information in the form of final conclusions.

In this practical exercise we have performed a simple code development which is conveniently documented, relying on free tools.

Access the Data Lab repository on Github.

Run the data pre-processing code on Google Colab.

2. Objetives

The main objective of this post is to learn how to make an interactive visualization using open data. For this practical exercise we have chosen datasets containing relevant information on national reservoirs. Based on that, we will analyse their state and time evolution within the last years.

3. Resources

3.1. Datasets

For this case study we have selected datasets published by Ministry for the Ecological Transition and Demographic Challenge, which in its hydrological bulletin collects time series data on the volume of water stored in the recent years in all the national reservoirs with capacity greater than 5hm3. Historical data on the volume of stored water are available at:

https://www.miteco.gob.es/es/agua/temas/evaluacion-de-los-recursos-hidricos/boletin-hidrologico/default.aspx

Furthermore, a geospatial dataset has been selected. During the search, two possible input data files have been found, one that contains geographical areas corresponding to the reservoirs in Spain and one that contains dams, including their geopositioning as a geographic point. Even though they are not the same thing, reservoirs and dams are related and to simplify this practical exercise, we choose to use the file containing the list of dams in Spain. Inventory of dams is available at: https://www.mapama.gob.es/ide/metadatos/index.html?srv=metadata.show&uuid=4f218701-1004-4b15-93b1-298551ae9446

https://www.miteco.gob.es/es/cartografia-y-sig/ide/descargas/egis_presa_geoetrs89_tcm30-175857.zip

This dataset contains geolocation (Latitude, Longitude) of dams throughout Spain, regardless of their ownership. A dam is defined as an artificial structure that limits entirely or partially a contour of an enclosure nestled in terrain and is destined to store water within it.

To generate geographic points of interest, a processing has been executed with the usage of QGIS tool. The steps are the following: download ZIP file, upload it to QGIS and save it as CSV, including the geometry of each element as two fields specifying its position as a geographic point (Latitude, Longitude).

Also, a filtering has been performed, in order to extract the data related to dams of reservoirs with capacity greater than 5hm3.

3.2. Tools

To perform data pre-processing, we have used Python programming language in the Google Colab cloud service, which allows the execution of JNotebooks de Jupyter.

Google Colab, also called Google Colaboratory, is a free service in the Google Research cloud which allows to program, execute and share a code written in Python or R through the browser, as it does not require installation of any tool or configuration.

Google Data Studio tool has been used for the creation of the interactive visualization.

Google Data Studio in an online tool which allows to create charts, maps or tables that can be embedded on websites or exported as files. This tool is easy to use and permits multiple customization options.

If you want to know more about tools that can help you with data treatment and visualization, see the report “Data processing and visualization tools”.

4. Enriquecimiento de los datos

In order to provide more information about each of the dams in the geospatial dataset, a process of data enrichment is carried out, as explained below.

To do this, we will focus on OpenRefine, which is a useful tool for this type of tasks. This open source tool allows to perform multiple data pre-processing actions, although at that point we will use it to conduct enrichment of our data by incorporation of context, automatically linking information that resides in a popular knowledge repository, Wikidata.

Once the tool is installed and launched on computer, a web application will open in the browser. In case this doesn´t happen, the application may be accessed by typing http://localhost:3333 in the browser´s search bar.

Steps to follow:

Step 1: Upload of CSV to the system (Figure 1).

Figura

Figure 1 – Upload of a CSV file to OpenRefine

Step 2: Creation of a project from uploaded CSV (Figure 2). OpenRefine is managed through projects (each uploaded CSV will become a project) that are saved for possible later use on a computer where OpenRefine is running. At this stage it´s required to name the project and some other data, such as the column separator, though the latter settings are usually filled in automatically.

Figure 2 – Creation of a project in OpenRefine

Step 3: Linkage (or reconciliation, according to the OpenRefine nomenclature) with external sources. OpenRefine allows to link the CSV resources with external sources, such as Wikidata. For this purpose, the following actions need to be taken (steps 3.1 to 3.3):
Step 3.1: Identification of the columns to be linked. This step is commonly based on analyst´s experience and knowledge of the data present in Wikidata. A tip: usually, it is feasible to reconcile or link the columns containing information of global or general character, such as names of countries, streets, districts, etc. and it´s not possible to link columns with geographic coordinates, numerical values or closed taxonomies (e.g. street types). In this example, we have found a NAME column containing name of each reservoir that can serve as a unique identifier for each item and may be a good candidate for linking
Step 3.2: Start of reconciliation. As indicated in figure 3, start reconciliation and select the only available source: Wikidata(en). After clicking Start Reconciling, the tool will automatically start searching for the most suitable vocabulary class on Wikidata, based on the values from the selected column.

Figure 3 – Start of the reconciliation process for the NAME column in OpenRefine

Step 3.3: Selection of the Wikidata class. In this step reconciliation values will be obtained. In this case, as the most probable value, select property “reservoir”, which description may be found at https://www.wikidata.org/wiki/Q131681 and it corresponds to the description of an “artificial lake to accumulate water”. It´s necessary to click again on Start Reconciling.

OpenRefine offers a possibility of improving the reconciliation process by adding some features that allow to target the information enrichment with higher precision. For that purpose, adjust property P4568, which description matches the identifier of a reservoir in Spain within SNCZI-IPE, as it may be seen in the figure 4.

Figure 4 – Selection of a Wikidata class that best represents the values on NAME column

Step 4: Generation of a column with reconciled or linked values. To do that, click on the NAME column and go to “Edit column → Add column based in this column”. A window will open where a name of the new column must be specified (in this case, WIKIDATA_RESERVOIR). In the expression box introduce: “http://www.wikidata.org/entity/”+cell.recon.match.id, so the values will be displayed as it´s previewed in figure 6. “http://www.wikidata.org/entity/” is a fixed text string that represents Wikidata entities, while the reconciled value of each of the values we obtain through the command cell.recon.match.id, that is, cell.recon.match.id(“ALMODOVAR”) = Q5369429.

Launching described operation will result in generation of a new column with those values. Its correctness may be confirmed by clicking on one of the new column cells, as it should redirect to a Wikidata web page containing information about reconciled value.

Repeat the process to add other type of enriched information as a reference for Google and OpenStreetMap.

Interfaz

Figure 5 – Generation of Wikidata entities through a reconciliation within a new column.

Step 5: Download of enriched CSV. Go to the function Export → Custom tabular exporter placed in the upper right part of the screen and select the features indicated in Figure 6.

Figura

Figure 6 – Options of CSV file download via OpenRefine

5. Data pre-processing

During the pre-processing it´s necessary to perform an exploratory data analysis (EDA) in order to interpret properly the input data, detect anomalies, missing data and errors that could affect the quality of subsequent processes and results, in addition to realization of the transformation tasks and preparation of the necessary variables. Data pre-processing is essential to ensure the reliability and consistency of analysis or visualizations that are created afterwards. To learn more about this process, see A Practical Introductory Guide to Exploratory Data Analysis.

The steps involved in this pre-processing phase are the following:

Installation and import of libraries
Import of source data files
Modification and adjustment of variables
Prevention and treatment of missing data (NAs)
Generation of new variables
Creation of a table for visualization “Historical evolution of water reserve between the years 2012-2022”
Creation of a table for visualization “Water reserve (hm3) between the years 2012-2022”
Creation of a table for visualization “Water reserve (%) between the years 2012-2022”
Creation of a table for visualization “Monthly evolution of water reserve (hm3) for different time series”
Saving the tables with pre-processed data

You may reproduce this analysis, as the source code is available in the GitHub repository. The way to provide the code is through a document made on Jupyter Notebook which once loaded to the development environment may be easily run or modified. Due to the informative nature of this post and its purpose to support learning of non-specialist readers, the code is not intended to be the most efficient but rather to be understandable. Therefore, it´s possible that you will think of many ways of optimising the proposed code to achieve a similar purpose. We encourage you to do it!

You may follow the steps and run the source code on this notebook in Google Colab.

6. Data visualization

Once the data pre-processing is done, we may move on to interactive visualizations. For this purpose, we have used Google Data Studio. As it´s an online tool, it´s not necessary to install software to interact or generate a visualization, but it´s required to structure adequately provided data tables.

In order to approach the process of designing the set of data visual representations, the first step is to raise the questions that we want to solve. We suggest the following: 

What is the location of reservoirs within the national territory?
Which reservoirs have the largest and the smallest volume of water (water reserve in hm3) stored in the whole country?
Which reservoirs have the highest and the lowest filling percentage (water reserve in %)?
What is the trend of the water reserve evolution within the last years?

Let´s find the answers by looking at the data!

6.1. Geographic location and main information on each reservoir

This visual representation has been created with consideration of geographic location of reservoirs and distinct information associated with each one of them. For this task, a table “geo.csv” has been generated during the data pre-processing.

Location of reservoirs in the national territory is shown on a map of geographic points.

Once the map is obtained, you may access additional information about each reservoir by clicking on it. The information will display in the table below. Furthermore, an option of filtering by hydrographic demarcation and by reservoir is available through the drop-down tabs.

View the visualization in full screen

6.2. Water reserve between the years 2012-2022

This visual representation has been made with consideration of water reserve (hm3) per reservoir between the years 2012 (inclusive) and 2022. For this purpose, a table “volumen.csv” has been created during the data pre-processing.

A rectangular hierarchy chart displays intuitively the importance of each reservoir in terms of volumn stored within the national total for the time period indicated above.

Ones the chart is obtained, an option of filtering by hydrographic demarcation and by reservoir is available through the drop-down tabs.

View the visualization in full screen

6.3. Water reserve (%) between the years 2012-2022

This visual representation has been made with consideration of water reserve (%) per reservoir between the years 2012 (inclusive) and 2022. For this task, a table “porcentaje.csv” has been generated during the data pre-processing.

The percentage of each reservoir filling for the time period indicated above is intuitively displayed in a bar chart.

Ones the chart is obtained, an option of filtering by hydrographic demarcation and by reservoir is available through the drop-down tabs.

View the visualization in ful screen

6.4. Historical evolution of water reserve between the years 2012-2022

This visual representation has been made with consideration of water reserve historical data (hm3 and %) per reservoir between the years 2012 (inclusive) and 2022. For this purpose, a table “lineas.csv” has been created during the data pre-processing.

Line charts and their trend lines show the time evolution of the water reserve (hm3 and %).

Ones the chart is obtained, modification of time series, as well as filtering by hydrographic demarcation and by reservoir is possible through the drop-down tabs.

View the visualization in full screen

6.5. Monthly evolution of water reserve (hm3) for different time series

This visual representation has been made with consideration of water reserve (hm3) from distinct reservoirs broken down by months for different time series (each year from 2012 to 2022). For this purpose, a table “lineas_mensual.csv” has been created during the data pre-processing.

Line chart shows the water reserve month by month for each time series.

Ones the chart is obtained, filtering by hydrographic demarcation and by reservoir is possible through the drop-down tabs. Additionally, there is an option to choose time series (each year from 2012 to 2022) that we want to visualize through the icon appearing in the top right part of the chart.

View the visualization in full screen

7. Conclusions

Data visualization is one of the most powerful mechanisms for exploiting and analysing the implicit meaning of data, independently from the data type and the user´s level of the technological knowledge. Visualizations permit to create meaningful data and narratives based on a graphical representation. In the set of implemented graphical representations the following may be observed:

A significant trend in decreasing the volume of water stored in the reservoirs throughout the country between the years 2012-2022.
2017 is the year with the lowest percentage values of the total reservoirs filling, reaching less than 45% at certain times of the year.
2013 is the year with the highest percentage values of the total reservoirs filling, reaching more than 80% at certain times of the year.

It should be noted that visualizations have an option of filtering by hydrographic demarcation and by reservoir. We encourage you to do it in order to draw more specific conclusions from hydrographic demarcation and reservoirs of your interest.

Hopefully, this step-by-step visualization has been useful for the learning of some common techniques of open data processing and presentation. We will be back to present you new reuses. See you soon!

27/07/2022

Copernicus data for the open data community

Documentación

This report published by the European Data Portal (EDP) aims to help open data users in harnessing the potential of the data generated by the Copernicus program.

The Copernicus project generates high-value satellite data, generating a large amount of Earth observation data, this is in line with the European Data Portal's objective of increasing the accessibility and value of open data.

The report addresses the following questions, What can I do with Copernicus data? How can I access the data?, and What tools do I need to use the data? using the information found in the European Data Portal, specialized catalogues and examining practical examples of applications using Copernicus data.

This report is available at this link: "Copernicus data for the open data community"

15/06/2022

Access to reuse of environmental information and open data

Blog

The favorable regime of access to environmental information

Environmental legislation has traditionally been characterized by establishing a more beneficial legal regime than that which has inspired the general rules on access to information held by the public sector. Indeed, the Aarhus Convention, adopted in 1998, was an important milestone in recognizing the right of access to environmental information under very advanced legal conditions, imposing relevant obligations on public authorities. Specifically, the Convention starts from an inescapable premise: in order for society to enjoy the right to a healthy environment and fulfill its duty to respect and protect it, it must have relevant access to environmental information. To this end, on the one hand, the right to obtain information held by public authorities was recognized and, on the other, an obligation was established for the latter to make certain information public without prior request.

In execution of said international treaty and, specifically, of the obligations assumed by the European Union through Directive 2003/4/EC of the European Parliament and of the Council, of January 28, 2003, on public access to environmental information, Law 27/2006, of July 18, regulating the rights of access to information, public participation and access to justice in environmental matters, was approved. Unlike the general regime contemplated in Law 19/2013, of December 9, on transparency, access to public information and good governance, Law 27/2006 does not contain any reference to open and reusable formats. However, it does include the following developments:

establishes the obligation to provide the information even when, without having generated it directly in the exercise of its functions, it is in the possession of the entity from which it is requested;
requires that the grounds for refusal of the request for access be interpreted in a restrictive manner, so that in case of doubt when interpreting the exceptions provided by law, access to information must be favored;
for those cases in which the request is not resolved and notified within the established period, the rule of positive silence is applied and, therefore, access will be understood to be granted.

The impact of regulations on open data and reuse of public sector information

As in the previous regulation, Directive (EU) 1024/2019 excludes its application in those cases in which the corresponding regulation of the Member States limits access. This would not be, therefore, the case of the environment sector, since, apart from the cases in which access is not applicable, in general the availability of the information is especially assured. Consequently, except for the legal exceptions to the obligation to provide environmental information, there are no specific restrictions that would be an obstacle to facilitating its reuse.

On the other hand, one of the main novelties of European legislation is a measure that ultimately obliges the Member States to adapt their regulations regarding access to environmental information. Indeed, Chapter V of the Directive establishes a unique regime for the so-called high-value datasets, which, in general, will be available free of charge, machine-readable, provided through APIs and, where appropriate, provided in the form of bulk download. Precisely, this very favorable legal regime is envisaged, among others, for the field of Earth Observation and Environment, although the specific datasets to which it will apply are still pending a decision by the European Commission after the elaboration of an extensive impact analysis whose final result is yet to be finalized.

On the other hand, following the European regulatory model, among the novelties that Royal Decree-Law 24/2021, of November 2, has incorporated into Spanish legislation on the reuse of public sector information, one that stands out is one referring to high-value data. Specifically, Article 3.ter of Law 37/2007 contemplates the possibility that, in addition to the datasets established by the European Commission, others may be added at the national level by the Ministry of Economic Affairs and Digital Transformation, taking into account the selection made by the Data Office Division, so that those specifically referring to the environment could be extended, where appropriate.

The potential for high-value environmental data

As the European regulation itself points out, the reuse of high-value datasets is seen as a tool to facilitate, among other objectives, the creation and dynamization of value-added digital applications and services that have the potential to generate considerable benefits for society, the environment and the economy. Thus, in this area, open data can play an important role in tackling technological innovation to address challenges of enormous relevance such as climate change, deforestation and, in general, the challenges posed by environmental conservation.

On the other hand, the development of digital applications and services can serve to revitalize rural areas and promote tourism models that value the knowledge and protection of natural resources, especially taking into account the rich and varied natural heritage existing in Spain, for which it is essential to have specific datasets, particularly with regard to natural areas.

Ultimately, from the perspective and demands of Open Government, the accessibility of environmental information, according to the standards of high-value data in accordance with the provisions of the regulations on the reuse of public sector information, could have a significant reinforcement by facilitating social control regarding the decisions of public entities and citizen participation. However, for this it is essential to overcome the model on which the regulatory framework on access to environmental information has traditionally been based, since, although at the time it represented a significant advance, the fact is that the 2006 regulation does not include any reference to the possibilities of technological innovation based on open data.

In short, it seems that the time has come to raise a debate about an eventual update of the sectorial regulation on access to environmental information in order to comply with the requirements of the legal regime contemplated in Directive (EU) 1024/2019.

Content prepared by Julián Valero, Professor at the University of Murcia and Coordinator of the Research Group "Innovation, Law and Technology" (iDerTec).

The contents and points of view reflected in this publication are the sole responsibility of its author.

22/02/2022

10 public data repositories related to natural sciences and the environment

Noticia

Open data is fundamental in the field of science. Open data facilitates scientific collaboration and enriches research by giving it greater depth. Thanks to this type of data, we can learn more about our environment and carry out more accurate analyses to support decisions.

In addition to the resources included in general data portals, we can find an increasing number of open data banks focused on specific areas of the natural sciences and the environment. In this article we bring you 10 of them.

10 public data repositories related to natural sciences and the environment body: NASA Open Data Portal, Copernicus, Climate Data Online, AlphaFold Protein Structure Database, Free GIS DATA, GBIF (Global Biodiversity Information Facility), EDI Data Portal, PANGAEA, re3data, IRIS

NASA Open Data Portal

Publisher: NASA

The data.nasa.gov portal centralizes NASA's open geospatial data generated from its rich history of planetary, lunar and terrestrial missions. It has nearly 20,000 unique monthly users and more than 40,000 datasets. A small percentage of these datasets are hosted directly on data.nasa.gov, but in most cases metadata and links to other space agency projects are provided.

data.nasa.gov includes a wide range of subject matter, from data related to rocket testing to geologic maps of Mars. The data are offered in multiple formats, depending on each publisher.

The site is part of the Open Innovation Sites project, along with api.nasa.gov, a space for sharing information about NASA APIs, and code.nasa.gov, where NASA's open source projects are collected.

Copernicus

Publisher: Copernicus

COPERNICUS is the Earth observation program of the European Union. Led by the European Commission, with the collaboration of the member states and various European agencies and organizations, it collects, stores, combines and analyzes data obtained through satellite observation and in situ sensor systems on land, air and sea.

It provides data through 6 services: emergency, security, marine monitoring, land monitoring, climate change and atmospheric monitoring. The two main access points to Copernicus satellite data are managed by ESA: the Copernicus Open Access Platform - which has an API - and the CSCDA (Copernicus Space Component Data Access). Other access points to Copernicus satellite data are managed by Eumetsat.

Climate Data Online

Publisher: NOAA (National Centers for Environmental Information)

Climate Data Online (CDO) from the US government agency NOAA provides free access to historical weather and climate data worldwide. Specifically, 26,000 datasets are offered, including daily, monthly, seasonal and annual measurements of parameters such as temperature, precipitation or wind, among others. Most of the data can be downloaded in CSV format.

To access the information, users can use, among other functionalities, a search tool, an API or a map viewer where a wide variety of data can be displayed in the same visualization environment, allowing variables to be related to specific locations.

AlphaFold Protein Structure Database

Publisher: DeepMind and EMBL-EBI

AlphaFold is an artificial intelligence system developed by the company DeepMind that predicts the 3D structure of a protein from its amino acid sequence. In collaboration with the EMBL European Bioinformatics Institute (EMBL-EBI), DeepMind has created this database that provides the scientific community with free access to these predictions.

The first version covers the human proteome and the proteomes of other key organisms, but the idea is to continue to expand the database to cover a large proportion of all cataloged proteins (over 100 million). The data can be downloaded in mmCIF or PDB format, which are widely accepted by 3D structure visualization programs such as PyMOL and Chimera.

Free GIS DATA

Publisher: Robin Wilson, expert in the GIS area.

Free GIS Data is the effort of Robin Wilson, a freelance expert in remote sensing, GIS, data science and Python. Here users can find a categorized list of links to over 500 websites offering freely available geographic datasets, all ready to be loaded into a Geographic Information System. You can find data on climate, hydrology, ecology, natural disasters, mineral resources, oil and gas, transportation and communications, or land use, among many other categories.

Users can contribute new datasets by sending them by email to robin@rtwilson.com.

GBIF (Global Biodiversity Information Facility)

Publisher: GBIF

GBIF is an intergovernmental initiative formed by countries and international organizations that collaborate to advance free and open access to biodiversity data. Through its nodes, participating countries provide data on species records based on common standards and open source tools. In Spain, the national node is GBIF-ES, sponsored by the Spanish Ministry of Science and Innovation and managed by the Spanish National Research Council (CSIC).

The data it offers comes from many sources, from specimens held in museums and collected in the 18th and 19th centuries to geotagged photos taken with smartphones and shared by amateur naturalists. It currently has more than 1.8 billion records and 63,000 datasets of great utility for researchers conducting studies related to biodiversity and the general public. You can also access its API here.

EDI Data Portal

Publisher:Environmental Data Initiative (EDI)

The Environmental Data Initiative (EDI) promotes the preservation and reuse of environmental data, supporting researchers to archive and publish publicly funded research data. This is done following FAIR principles and using the Ecological Metadata Language (EML) standard.

The EDI data portal contains the contributed environmental and ecological data packages, which can be accessed through a search engine or API. Users should contact the data provider before using the data in any research. These data should be properly cited when used in a publication. A Digital Object Identifier (DOI) is provided for this purpose.

PANGAEA

Publisher: World Data Center PANGEA

The PANGAEA information system functions as an open access library for archiving, publishing and distributing georeferenced Earth system research data.

Any user can upload data related to the natural sciences. PANGAEA has a team of editors who are responsible for checking the integrity and consistency of the data and metadata. It currently includes more than 400,000 datasets from more than 650 projects. The formats in which they are available are varied: you can find from text/ASCII or tab-delimited files, to binary objects (e.g. seismic data and models, among others) or other formats following ISO standards (such as images or movies).

re3data

Publisher: DataCite

Re3data is a worldwide registry of research data repositories covering databases from different academic disciplines available free of charge. It includes data related to the natural sciences, medicine or engineering, as well as those linked to humanities areas.

It currently offers detailed descriptions of more than 2,600 repositories. These descriptions are based on the re3data metadata schema and can be accessed through the re3data API. In this Github repository you can find examples of how to use the re3data API. These examples are implemented in R using Jupyter Notebooks.

IRIS

Publisher: Incorporated Research Institutions for Seismology (IRIS)

IRIS is a consortium of more than 100 U.S. universities dedicated to the operation of scientific facilities for the acquisition, management and distribution of seismological data. Through this website any citizen can access a variety of resources and data related to earthquakes occurring around the world.

It collects time series data, including sensor recordings of a variety of measurements. Among the metadata available is the location of the station from which the data was obtained and its instrumentation. It also provides access to historical seismic data, including scanned seismograms and other information from pre-digital sources.

Data are available in SEED (the international standard for the exchange of digital seismic data), ASCII or SAC (Seismic Analysis Code) format.

Do you know more international repositories with data related to natural sciences and environment? Leave us a comment or send us an email to dinamizacion@datos.gob.es.

23/12/2021

The 2nd Regional Meeting of Smart Municipalities will host an open data hackathon

Evento

The Alcazar of Jerez de la Frontera will host, on 23 and 24 September, the II Regional Meeting of Smart Municipalities. Its objective is to advance in the smart development of Andalusian municipalities, in line with the UN Sustainable Development Goals. The event is organised by the Provincial Council of Cadiz and the Andalusian Federation of Municipalities and Provinces, with the collaboration of the Regional Government of Andalusia, the City Council of Jerez, the University of Cadiz and the Smart City Cluster.

More than 20 presentations and round tables will take place over two days. Industry 4.0, artificial intelligence, SmartAgriFood or digital administration are some of the topics that will be discussed. The full programme (in Spanish) can be seen here.

The event will also host a hackathon with the aim of boosting the use and intelligence of data.

Hack4ERMI2021

Under the motto "Objective Smart and Resilient Territory", participants in the hackathon will have to use their creative and innovative thinking to find concrete and feasible solutions to 4 challenges:

Ecological transition, climate change and environmental sustainability 2.
Resilience and security
Data economy, competitiveness and progress
Health and welfare

All solutions should have in common the use and exploitation of open datasets, which can be combined with other sources of public information or data from IoT devices.

open data sets, which can be combined with other sources of public information or information from IoT devices.

To participate, a team of two to five people is required. Teams should be diverse in terms of gender, expertise and experience.

The competition will take place in several phases:

Preliminary phase, from 23 August to 10 September. Teams must submit a maximum of three ideas that respond to the challenges indicated above. To do so, they will have to submit a dossier explaining the idea and a video through the form provided for registration. A jury will evaluate the proposals and choose the five best ones, which will go on to the next phase. Those participants whose ideas have not been chosen may, if they wish, join one of the finalist teams.
Workshop. 16th September. Selected teams will have the opportunity to participate in an online workshop to learn how to use FiWoo, a FIWARE-based Smart City platform.
Hack4ERMI202: Ideas, Tech & Rock ́n Roll, 23-24 September. The teams will have a room available throughout the II Regional Meeting of Smart Municipalities, where they will be able to finalise the definition of the solutions presented. On the 24th they will present their proposals to the public at the congress.

The jury, made up of representatives of the organising and collaborating entities of the Meeting, will choose the 3 winners. The first winner will receive 2,000 euros, the second 1,000 euros and the third 500 euros.

If you have any questions, please contact the organisers by email at hack4ermi2021@dipucadiz.es.

Do you want to attend the II Regional Meeting of Smart Municipalities?

Participation in the hackathon is open to all citizens who wish to participate, but attendance at the II Regional Meeting of Smart Municipalities is limited, due to the pandemic situation.

Attendance in person is limited and by registration at the following link. However, the meeting can be followed online in its entirety via YouTube. The link will be available on the event's website in the coming weeks.

02/09/2021

The Spanish company CleanSpot, finalist of the EU Datathon 2021

Noticia

96 ideas from 33 countries. Those have been the proposals presented to the UE Datathon 2021, a competition organized by the Publications Office and the Presidency of the Council of the European Union to promote the use of open data as a basis for new ideas, innovative products and services.

Proposals could be submitted to three different categories: “A European Green Deal”, focused on promoting sustainability, “An economy that works for people”, focused on reducing poverty and inequality, and “A Europe fit for the digital age ”, which seeks improvements in competencies related to data and european strategy in the matter.

For each of these categories the jury has chosen 3 finalists.

CleanSpot, the Spanish presence in the contest

There were 12 proposals from Spanish teams for the EU Datathon 2021, one of them, CleanSpot, has achieved a place in the final, within the “A European Green Deal” category.

CleanSpot is an app that seeks to promote awareness and incentive of recycling through gamification. The app allows you to locate recycling and reuse points, such as clean points, specialized containers or collection services and centers. The novelty is that it also allows calculating the CO2 that each user avoids emitting into the atmosphere by performing a daily action such as throwing waste into the corresponding recycling container or donating it for reuse. Users can share their results and thus show the community how much they have reduced their carbon footprint, contributing to caring for the environment.

The users with the best score in the ranking receive prizes and recognition. In addition, each time the user goes to a collection or recycling point to deposit their waste, they accumulate points, which can be exchanged for discounts on municipal taxes, savings checks in local businesses or direct payments.

In addition, the app allows you to save favorite locations or services, and gives the option of receiving notifications, for example, reminders about the passage of the mobile clean point - for this service to be available in a specific municipality, prior integration is necessary -. It also allows the generation of awareness campaigns, with advice on recycling or information on specific actions in each area.

Finalists from 8 different countries

This year, the presence of finalists from multiple territories stands out. Only Italy repeats with three teams, one shared with France.

In the category of “A European Green Deal”, CleanSpot will face each other with FROG2G, from Montenegro, and The Carbons, from India. FROG2G is an interactive visualization tool, created to offer a viable model to make Europe greener, while The Carbons allows you to compare the greenhouse gases that are emitted, for example, when you have a cup of coffee or take a drive.
In the category "An economy that works for people", we find CityScale, from Ukraine, a tool to visualize, compare and find the best place to live; ITER IDEA, from Italy, a portal that facilitates the mobility of women in Europe; and PowerToYEUth, from Portugal, focused on locating public funding for SMEs and promoting youth employment.
Finally, in the category “A Europe fit for the digital age”, the finalists are Democracy Game, from Greece, a virtual debate tool; TrackmyEU, from Italy and France, which makes it possible to explore EU policies, follow topics of interest and make the voice of the citizenry heard in Brussels; and VislmE-360, also from Italy, which offers a 360ᵒ view of visual impairments in the EU.

Next steps

The nine finalist teams have 5 months to develop their proposals, from June to November. Proposals will be evaluated by a jury of experts, based on criteria such as lThe open data used and the fitness for purpose. The winner will receive € 18,000, while the second and third will receive € 10,000 and € 5,000 respectively.

The award ceremony will be on November 25, 2021, within the framework of the EU Open Data Days, an event that this year has its first edition. In this event, aimed at push the use of open data in Europe to generate value, we will be able to see the various opportunities and business models offered by the reuse of public information.

03/08/2021

Cross-Forest, a project for harmonisation and promotion of open forest data

Noticia

The Cross-Forest project combines two areas of great interest to Europe, as set out in the Green Deal. On the one hand, the care and protection of the environment - in particular our forests-. On the other hand, the promotion of an interoperable European digital ecosystem.

The project started in 2018 and ended on 23 June, resulting in different tools and resources, as we will see below.

What is Cross-Forest?

Cross-Forest is a project co-funded by the European Commission through the CEF (Connecting Europe Facility) programme, which seeks to publish and combine open and linked datasets of forest inventories and forest maps, in order to promote models that facilitate forest management and protection.

The project has been carried out by a consortium formed by the Tragsa Public Group, the University of Valladolid and Scayle Supercomputacion of Castilla y León, with the institutional support of the Ministry for Ecological Transition and the Demographic Challenge (MITECO, in Spanish acronyms). On the Portuguese side, the Direção-Geral do Território of Portugal has participated.

The project has developed:

A Digital Services Infrastructure (DSI) for open forest data, oriented towards modelling forest evolution at country level, as well as predicting forest fire behaviour and spread. Data on fuel materials, forest maps and spread models have been used. High Performance Computing (HPC) resources have been used for their execution, due to the complexity of the models and the need for numerous simulations.
An ontological model of forest data common to public administrations and academic institutions in Portugal and Spain, for the publication of linked open data. Specifically, a set of eleven ontologies has been created. These ontologies, which are aligned with the INSPIRE Directive, interrelate with each other and are enriched by linking to external ontologies. Although they have been created with a focus on these two countries, the idea is that any other territory can use them to publish their forest data, in an open and standard format.

The different datasets used in the project are published separately, so that users can use the ones they want. All the data, which are published under CC BY 4.0 licence, can be accessed through this Cross-Forest Github repository and the IEPNB Data Catalogue.

4 flagship projects in Linked Open Data format

Thanks to Cross-Forest, a large part of the information of 4 flagship projects of the General Directorate of Biodiversity, Forests and Desertification of the Ministry for Ecological Transition and the Demographic Challenge has been published in linked open data format:

National Forest Inventory (IFN-3). It includes more than 100 indicators of the state and evolution of the forests. These indicators range from their surface area or the tree and shrub species that inhabit them, to data related to regeneration and biodiversity. It also incorporates the value in monetary terms of the environmental, recreational and productive aspects of forest systems, among other aspects. It has more than 90,000 plots. Two databases corresponding to a subset of the NFI indicators have been published openly.
Forest Map of Spain (Scale 1:50.000). It consists of the mapping of the situation of forest stands, following a conceptual model of hierarchical land uses.
National Soil Erosion Inventory (INES). This is a study that detects, quantifies and cartographically reflects the main erosion processes affecting the Spanish territory, both forest and agricultural. Its objective is to know its evolution over time thanks to continuous data collection and it has more than 20,000 plots.
General Forest Fire Statistics. It includes the information collected in the Fire Reports that are completed by the Autonomous Communities for each of the forest fires that take place in Spain.

These datasets, along with others from this Ministry, have been federated with datos.gob.es, so that they are also available through our data catalogue. Like any other dataset that is published on datos.gob.es, they will automatically be federated with the European portal as well.

The predecessor of this project was CrossNature. This project resulted in the Eidos database, which includes linked data on wild species of fauna and flora in Spain and Portugal. It is also available on datos.gob.es and is reflected in the European portal.

Both projects are an example of innovation and collaboration between countries, with the aim of achieving more harmonised and interoperable data, facilitating to compare indicators and improve actions, in this case, in the field of forest protection.

22/07/2021

Do you know the sectors of datos.gob.es?

Noticia

A symptom of the maturity of an open data ecosystem is the possibility of finding datasets and use cases across different sectors of activity. This is considered by the European Open Data Portal itself in its maturity index. The classification of data and their uses by thematic categories boosts re-use by allowing users to locate and access them in a more targeted way. It also allows needs in specific areas to be detected, priority sectors to be identified and impact to be estimated more easily.

In Spain we find different thematic repositories, such as UniversiData, in the case of higher education, or TURESPAÑA, for the tourism sector. However, the fact that the competences of certain subjects are distributed among the Autonomous Communities or City Councils complicates the location of data on the same subject.

Datos.gob.es brings together the open data of all the Spanish public bodies that have carried out a federation process with the portal. Therefore, in our catalogue you can find datasets from different publishers segmented by 22 thematic categories, those considered by the Technical Interoperability Standard.

Icons of the categories available in the data catalogue and the number of datasets of each category (https://datos.gob.es/es/catalogo)

Number of datasets by category as of June 2021

But in addition to showing the datasets divided by subject area, it is also important to show highlighted datasets, use cases, guides and other help resources by sector, so that users can more easily access content related to their areas of interest. For this reason, at datos.gob.es we have launched a series of web sections focused on different sectors of activity, with specific content for each area.

4 sectorial sections that will be gradually extended to other areas of interest

Currently in datos.gob.es you can find 4 sectors: Environment, Culture and leisure, Education and Transport. These sectors have been highlighted for different strategic reasons:

Environment: Environmental data are essential to understand how our environment is changing in order to fight climate change, pollution and deforestation. The European Commission itself considers environmental data to be highly valuable data in Directive 2019/1024. At datos.gob.es you can find data on air quality, weather forecasting, water scarcity, etc. All of them are essential to promote solutions for a more sustainable world.
Transport: Directive 2019/1024 also highlights the importance of transport data. Often in real time, this data facilitates decision-making aimed at efficient service management and improving the passenger experience. Transport data are among the most widely used data to create services and applications (e.g. those that inform about traffic conditions, bus timetables, etc.). This category includes datasets such as real-time traffic incidents or fuel prices.
Education: With the advent of COVID-19, many students had to follow their studies from home, using digital solutions that were not always ready. In recent months, through initiatives such as the Aporta Challenge, an effort has been made to promote the creation of solutions that incorporate open data in order to improve the efficiency of the educational sphere, drive improvements - such as the personalisation of education - and achieve more universal access to knowledge. Some of the education datasets that can be found in the catalogue are the degrees offered by Spanish universities or surveys on household spending on education.
Culture and leisure: Culture and leisure data is a category of great importance when it comes to reusing it to develop, for example, educational and learning content. Cultural data can help generate new knowledge to help us understand our past, present and future. Examples of datasets are the location of monuments or listings of works of art.

Structure of each sector

Each sector page has a homogeneous structure, which facilitates the location of contents also available in other sections.

It starts with a highlight where you can see some examples of outstanding datasets belonging to this category, and a link to access all the datasets of this subject in the catalogue.

It continues with news related to the data and the sector in question, which can range from events or information on specific initiatives (such as Procomún in the field of educational data or the Green Deal in the environment) to the latest developments at strategic and operational level.

Finally, there are three sections related to use cases: innovation, reusing companies and applications. In the first section, articles provide examples of innovative uses, often linked to disruptive technologies such as Artificial Intelligence. In the last two sections, we find specific files on companies and applications that use open data from this category to generate a benefit for society or the economy.

Highlights section on the home page

In addition to the creation of sectoral pages, over the last year, datos.gob.es has also incorporated a section of highlighted datasets. The aim is to give greater visibility to those datasets that meet a series of characteristics: they have been updated, are in CSV format or can be accessed via API or web services.

Screenshot of the homepage with the highlighted datasets

What other sectors would you like to highlight?

The plans of datos.gob.es include continuing to increase the number of sectors to be highlighted. Therefore, we invite you to leave in comments any proposal you consider appropriate.

01/07/2021