casos de uso | datos.gob.es

Copernicus data for the open data community

Documentación

This report published by the European Data Portal (EDP) aims to help open data users in harnessing the potential of the data generated by the Copernicus program.

The Copernicus project generates high-value satellite data, generating a large amount of Earth observation data, this is in line with the European Data Portal's objective of increasing the accessibility and value of open data.

The report addresses the following questions, What can I do with Copernicus data? How can I access the data?, and What tools do I need to use the data? using the information found in the European Data Portal, specialized catalogues and examining practical examples of applications using Copernicus data.

This report is available at this link: "Copernicus data for the open data community"

15/06/2022

The main tourism datasets from datos.gob.es

Noticia

Who hasn't ever used an app to plan a romantic getaway, a weekend with friends or a family holiday? More and more digital platforms are emerging to help us calculate the best route, find the cheapest petrol station or make recommendations about hotels and restaurants according to our tastes and needs. Many of them have a common denominator, and that is that their operation is based on the use of data coming, for the most part, from public administrations.

It is becoming increasingly easy to find tourism-related data that have been published openly by various public bodies. Tourism is one of the sectors that generates the most revenue in Spain year after year. Therefore, it is not surprising that many organisations choose to open tourism data in exchange for attracting a greater number of visitors to the different areas of our country.

Below, we take a look at some of the datasets on tourism that you can find in the National Catalogue of Open Data in order to reuse them to develop new applications or services that offer improvements in this field.

These are the types of data on tourism that you can find in datos.gob.es

In our portal you can access a wide catalogue of data that is classified by different sectors. The Tourism category currently has 2,600 datasets of different types, including statistics, financial aid, points of interest, accommodation prices, etc.

Of all these datasets, here are some of the most important ones together with the format in which you can consult them:

At the state level

National Statistics Institute (Ministry of Economic Affairs and Digital Transformation). Average stay, by type of accommodation by Autonomous Communities and Autonomous Cities. CSV, XLSX, XLS, HTML, JSON, PC-Axis.
State Meteorological Agency (AEMET). Forecast by municipality, 7 days. XML.
Geological and Mining Institute of Spain (Ministry of Science and Innovation). Spanish Inventory of Places of Geological Interest (IELIG). HTML, JSON, KMZ, XML.
National Statistics Institute (Ministry of Economic Affairs and Digital Transformation). Rural Tourism Accommodation Price Index (RTAPI): national general index and by tariffs. CSV, HTML, JSON, PC-Axis, CSV
National Statistics Institute (Ministry of Economic Affairs and Digital Transformation). Holiday Dwellings Price Index (HDPI): national general index and by tariffs. CSV, HTML, JSON, PC-Axis, CSV
National Statistical Institute (Ministry of Economic Affairs and Digital Transformation). Tourist Campsite Price Index (TCPI): national general index and by tariffs. CSV, HTML, JSON, PC-Axis

At the level of the Autonomous Regions

Regional Government of Andalusia. Andalusia Tourism Situation Survey. CSV, HTML
Autonomous Community of the Basque Country. Tourist destinations in the Basque Country: towns, counties, routes, walks and experiences. RSS, API, XLSX, GeoJSON, XML, JSON, KML.
Autonomous Community of Navarre. Signposting Camino Santiago. CSV, HTML, JSON, ODS, TSV, XLSX, XML.
Autonomous Community of the Canary Islands. Active Tourism Activities registered in the General Tourism Register. XLS, CSV.
Autonomous Community of Navarra. Ornithological tourism. CSV, HTML, JSON, ODS, TSV, XLSX, XML.
Autonomous Community of Aragon. Footpaths of Aragon. XML, JSON, CSV, XLS.
Cantabrian Institute of Statistics. Directory of Collective Tourist Accommodation (ALOJATUR) of the Canary Islands. JSON, XML, ZIP, CSV.

At the local level

Valencia City Council. Tourist monuments. CSV, GML, JSON, KML, KMZ, OCTET-STREAM, WFS, WMS.
Lorca City Council. Itineraries of tourist routes in the city centre. KMZ.
Almendralejo Town Hall. Restaurants and bars of Almendralejo. XML, TSV, CSV, JSON, XLSX.
Madrid City Council. Tourist offices of Madrid. HTML, RDF-XML, RSS, XML, CSV, JSON.
Vigo City Council. Urban Tourism. CSV, JSON, KML, ZIP, XLS, CSV.

Some examples of re-use of tourism-related data

As we indicated at the beginning of this article, the opening up of data by public administrations facilitates the creation of applications and platforms that, by reusing this information, offer a quality service to citizens, improving the experience of travellers, for example, by providing updated information of interest. This is the case of Playas de Mallorca, which informs its users about the state of the island's beaches in real time, or Castilla y León Gurú, a tourist assistant for Telegram, with information about restaurants, monuments, tourist offices, etc. We can also find applications that make saving money easier (Geogasolineras) or that help people with disabilities to get around the destination (Ruta Accesible - How to get there in a wheelchair).

Public administrations can also take advantage of this information to get to know tourists better. For example, Madrid en Bici, thanks to the data provided by the city's portal, is able to draw up an X-ray of the real use of bicycles in the capital. This makes it possible to make decisions related to this service.

In our impact section, in addition to applications, you can also find numerous companies related to the tourism sector that use public data to offer and improve their services. This is the case of Smartvel or Bloowatch.

Do you know of any company that uses tourism data or an application based on it? Then don't hesitate to leave us a comment with all the information or send us an email to contacto@datos.gob.es. We will be happy to read it!

20/04/2022

Experiences in managing visitor flows in tourist destinations

Blog

Today's tourism industry has a major challenge in managing the concentration of people visiting both open and closed spaces. This issue was already very important in 2019, when, according to the World Tourism Organisation, the number of travellers worldwide exceeded 1.4 billion. The aim was to minimise the negative impact of mass tourism on the environment, local communities and the tourist attractions themselves. But also, to ensure the quality of the experience for visitors who will prefer to schedule their visits in situations where the total occupancy of the area they intend to visit is lower.

The restrictions associated with the pandemic drastically reduced visitor numbers, which in 2020 and 2021 were less than a third of the number recorded in 2019, but made it much more important to manage visitor flows, even if this was for public health reasons.

Graph showing the evolution of travellers in the world. It can be seen how it grew until 2019 and how in 2020 and 2021 it has been lower.

We are now in an intermediate situation between restrictions that seem to be in their final phase and a steady growth in visitor numbers, making cities more sensitive than ever to use data-driven solutions to promote tourism and at the same time control visitor flows.

Know the number of visitors in real time with Afflueces

Among the occupancy management applications that help tourists avoid queues and crowds indoors is Affluences, a French-born solution that allows tourists to monitor the occupancy of museums, libraries, swimming pools and shops in real time.

Affluences application screenshot

The proposal of this solution consists of measuring the influx of visitors in closed spaces using people counting systems and then analysing and communicating it to the user, providing data such as waiting time and occupancy rate.

In some cases, Affluences installs sensors in the institutions or uses existing sensors to measure in real time the number of people present in the institution. In other cases, it uses the real-time occupancy data provided by the facilities as open data, as in the case of the swimming pools of the city of Strasbourg.

The data measured in real time are enriched with other sources of information such as attendance history, opening calendar, etc. and are processed by predictive analytics algorithms every 5 minutes. This approach makes it possible to provide the user with much more accurate information than can be obtained, for example, via Google Maps, based on the analysis of location data captured via mobile phones.

Screenshot from Google Maps

Find a seat on public transport with CityMapper

CityMapper is probably the best known urban mobility app in major European cities and one of the most popular worldwide. It was founded in London, but is already present in 71 European cities in 31 countries and aggregates 368 different modes of transport. Among these cities are of course Madrid and Barcelona, but also a number of large cities in Spain such as Valencia, Seville, Zaragoza or Malaga.

CityMapper allows you to calculate multimodal routes by combining a large number of modes of transport: metros and buses together with bicycles, scooters and even mopeds where available. If we choose, for example, the bicycle as a means of transport, the application provides the user with granular data such as how many bicycles are available at the pick-up point and how many empty parking spaces are available at the destination.

Capture of Citymapper

But the differentiating factor of CityMapper and probably the one that has had the greatest influence on its great success of adoption is the clever way in which it uses a combination of open and private data and artificial intelligence to provide users with highly accurate estimates of waiting times, journey times and even traffic disruptions.

For example, CityMapper is even able to provide information about the occupancy of some of the modes of transport it suggests on routes so that the user can for example choose the least congested carriage on the train they are waiting for. The application even suggests where the user should be positioned to optimise the journey by specifying which entrances and exits to use.

CityMapper screenshot

Outdoor visitor flows with FLOWS

The management of outdoor visitor flows introduces new elements of difficulty both in measuring occupancy and in establishing stable predictive models that are useful for visitors and for those responsible for planning security measures. This requires new data sources and special attention to the privacy of the users whose data is analysed.

FLOWS is a project that is working to help cities and tourism establishments prepare for peak tourism periods and redirect visitors to less congested areas. To achieve this ambitious goal, it combines anonymised data from various sources such as traffic control sensors, data from open Wi-Fi networks, data from mobile phone operators, data from tourist records or itinerary and reservation management systems, water and energy consumption data, waste collection or social media posts.

Through a simple user interface it will allow advanced analysis and forecasting of tourist movements showing traffic flows, traffic congestion, seasonal deviations, entries/exits to the destination, movement within the destination, etc. It will be possible to display the analyses in the selected time interval and make predictions based on historical data considering seasonal factors.

Capture of Flows

These are just a few examples of the many initiatives that are working to address a major challenge facing tourism during the green and digital transition - the management of traffic flows in both indoor and outdoor spaces. The coming years will undoubtedly see breakthroughs that will change the way we experience tourism and make the experience more enjoyable while minimising the impact we have on the environment and local communities.

Content prepared by Jose Luis Marín, Senior Consultant in Data, Strategy, Innovation & Digitalization.

The contents and points of view reflected in this publication are the sole responsibility of its author.

06/04/2022

Characteristics of the Spanish University students and most demanded degrees

Documentación

1. Introduction

Visualizations are graphical representations of data that allow to transmit in a simple and effective way the information linked to them. The visualization potential is very wide, from basic representations, such as a graph of lines, bars or sectors, to visualizations configured on control panels or interactive dashboards. Visualizations play a fundamental role in drawing conclusions from visual information, also allowing to detect patterns, trends, anomalous data, or project predictions, among many other functions. 

Before proceeding to build an effective visualization, we need to perform a previous treatment of the data, paying special attention to obtaining them and validating their content, ensuring that they are in the appropriate and consistent format for processing and do not contain errors. A preliminary treatment of the data is essential to perform any task related to the analysis of data and of performing an effective visualization.

In the \"Visualizations step-by-step\" section, we are periodically presenting practical exercises on open data visualization that are available in the datos.gob.es catalog or other similar catalogs. There we approach and describe in a simple way the necessary steps to obtain the data, perform the transformations and analyzes that are pertinent to, finally, we create interactive visualizations, from which we can extract information that is finally summarized in final conclusions.

In this practical exercise, we have carried out a simple code development that is conveniently documented by relying on tools for free use. All generated material is available for reuse in the GitHub Data Lab repository.

Access the data lab repository on Github.

Run the data pre-processing code on Google Colab.

2. Objetives

The main objective of this post is to learn how to make an interactive visualization based on open data. For this practical exercise we have chosen datasets that contain relevant information about the students of the Spanish university over the last few years. From these data we will observe the characteristics presented by the students of the Spanish university and which are the most demanded studies.

3. Resources

3.1. Datasets

For this practical case, data sets published by the Ministry of Universities have been selected, which collects time series of data with different disaggregations that facilitate the analysis of the characteristics presented by the students of the Spanish university. These data are available in the datos.gob.es catalogue and in the Ministry of Universities' own data catalogue. The specific datasets we will use are:

Enrolled by type of university modality, area of nationality and field of science, and enrolled by type and modality of university, gender, age group and field of science for PHD students by autonomous community from the academic year 2015-2016 to 2020-2021.
Enrolled by type of university modality, area of nationality and field of science, and enrolled by type and modality of the university, gender, age group and field of science for master's students by autonomous community from the academic year 2015-2016 to 2020-2021.
Enrolled by type of university modality, area of nationality and field of science and enrolled by type and modality of the university, gender, age group and field of study for bachelor´s students by autonomous community from the academic year 2015-2016 to 2020-2021.
Enrolments for each of the degrees taught by Spanish universities that is published in the Statistics section of the official website of the Ministry of Universities. The content of this dataset covers from the academic year 2015-2016 to 2020-2021, although for the latter course the data with provisional.

3.2. Tools

To carry out the pre-processing of the data, the R programming language has been used from the Google Colab cloud service, which allows the execution of Notebooks de Jupyter.

Google Colaboratory also called Google Colab, is a free cloud service from Google Research that allows you to program, execute and share code written in Python or R from your browser, so it does not require the installation of any tool or configuration.

For the creation of the interactive visualization the Datawrapper tool has been used.

Datawrapper is an online tool that allows you to make graphs, maps or tables that can be embedded online or exported as PNG, PDF or SVG. This tool is very simple to use and allows multiple customization options.

If you want to know more about tools that can help you in the treatment and visualization of data, you can use the report \"Data processing and visualization tools\".

4. Data pre-processing

As the first step of the process, it is necessary to perform an exploratory data analysis (EDA) in order to properly interpret the initial data, detect anomalies, missing data or errors that could affect the quality of subsequent processes and results, in addition to performing the tasks of transformation and preparation of the necessary variables. Pre-processing of data is essential to ensure that analyses or visualizations subsequently created from it are reliable and consistent. If you want to know more about this process you can use the Practical Guide to Introduction to Exploratory Data Analysis.

The steps followed in this pre-processing phase are as follows:

Installation and loading the libraries
Loading source data files
Creating work tables
Renaming some variables
Grouping several variables into a single one with different factors
Variables transformation
Detection and processing of missing data (NAs)
Creating new calculated variables
Summary of transformed tables
Preparing data for visual representation
Storing files with pre-processed data tables

You'll be able to reproduce this analysis, as the source code is available in this GitHub repository. The way to provide the code is through a document made on a Jupyter Notebook that once loaded into the development environment can be executed or modified easily. Due to the informative nature of this post and in order to facilitate learning of non-specialized readers, the code does not intend to be the most efficient, but rather make it easy to understand, therefore it is likely to come up with many ways to optimize the proposed code to achieve a similar purpose. We encourage you to do so! 

You can follow the steps and run the source code on this notebook in Google Colab.

5. Data visualizations

Once the data is pre-processed, we proceed with the visualization. To create this interactive visualization we use the Datawrapper tool in its free version. It is a very simple tool with special application in data journalism that we encourage you to use. Being an online tool, it is not necessary to have software installed to interact or generate any visualization, but it is necessary that the data table that we provide is properly structured.

To address the process of designing the set of visual representations of the data, the first step is to consider the queries we intent to resolve. We propose the following:

How is the number of men and women being distributed among bachelor´s, master's and PHD students over the last few years?

If we focus on the last academic year 2020-2021:

What are the most demanded fields of science in Spanish universities? What about degrees?
Which universities have the highest number of enrolments and where are they located?
In what age ranges are bachelor´s university students?
What is the nationality of bachelor´s students from Spanish universities?

Let's find out by looking at the data!

5.1. Distribution of enrolments in Spanish universities from the 2015-2016 academic year to 2020-2021, disaggregated by gender and academic level

We created this visual representation taking into account the bachelor, master and PHD enrolments. Once we have uploaded the data table to Datawrapper (dataset \"Matriculaciones_NivelAcademico\"), we have selected the type of graph to be made, in this case a stacked bar diagram to be able to reflect by each course and gender, the people enrolled in each academic level. In this way we can also see the total number of students enrolled per course. Next, we have selected the type of variable to represent (Enrolments) and the disaggregation variables (Gender and Course). Once the graph is obtained, we can modify the appearance in a very simple way, modifying the colors, the description and the information that each axis shows, among other characteristics.

To answer the following questions, we will focus on bachelor´s students and the 2020-2021 academic year, however, the following visual representations can be replicated for master's and PHD students, and for the different courses.

5.2. Map of georeferenced Spanish universities, showing the number of students enrolled in each of them

To create the map, we have used a list of georeferenced Spanish universities published by the Open Data Portal of Esri Spain. Once the data of the different geographical areas have been downloaded in GeoJSON format, we transform them into Excel, in order to combine the datasets of the georeferenced universities and the dataset that presents the number of enrolled by each university that we have previously pre-processed. For this we have used the Excel VLOOKUP() function that will allow us to locate certain elements in a range of cells in a table

Before uploading the dataset to Datawrapper, we need to select the layer that shows the map of Spain divided into provinces provided by the tool itself. Specifically, we have selected the option \"Spain>>Provinces(2018)\". Then we proceed to incorporate the dataset \"Universities\", previously generated, (this dataset is attached in the GitHub datasets folder for this step-by-step visualization), indicating which columns contain the values of the variables Latitude and Longitude. 

From this point, Datawrapper has generated a map showing the locations of each of the universities. Now we can modify the map according to our preferences and settings. In this case, we will set the size and the color of the dots dependent from the number of registrations presented by each university. In addition, for this data to be displayed, in the \"Annotate\" tab, in the \"Tooltips\" section, we have to indicate the variables or text that we want to appear.

5.3. Ranking of enrolments by degree

For this graphic representation, we use the Datawrapper table visual object (Table) and the \"Titulaciones_totales\" dataset to show the number of registrations presented by each of the degrees available during the 2020-2021 academic year. Since the number of degrees is very extensive, the tool offers us the possibility of including a search engine that allows us to filter the results.

5.4. Distribution of enrolments by field of science

For this visual representation, we have used the \"Matriculaciones_Rama_Grado\" dataset and selected sector graphs (Pie Chart), where we have represented the number of enrolments according to sex in each of the field of science in which the degrees in the universities are divided (Social and Legal Sciences, Health Sciences, Arts and Humanities, Engineering and Architecture and Sciences). Just like in the rest of the graphics, we can modify the color of the graph, in this case depending on the branch of teaching.

5.5. Matriculaciones de Grado por edad y nacionalidad

For the realization of these two representations of visual data we use bar charts (Bar Chart), where we show the distribution of enrolments in the first, disaggregated by gender and nationality, we will use the data set \"Matriculaciones_Grado_nacionalidad\" and in the second, disaggregated by gender and age, using the data set \"Matriculaciones_Grado_edad \". Like the previous visuals, the tool easily facilitates the modification of the characteristics presented by the graphics.

6. Conclusions

Data visualization is one of the most powerful mechanisms for exploiting and analyzing the implicit meaning of data, regardless of the type of data and the degree of technological knowledge of the user. Visualizations allow us to extract meaning out of the data and create narratives based on graphical representation. In the set of graphical representations of data that we have just implemented, the following can be observed:

The number of enrolments increases throughout the academic years regardless of the academic level (bachelor´s, master's or PHD).
The number of women enrolled is higher than the men in bachelor's and master's degrees, however it is lower in the case of PHD enrollments, except in the 2019-2020 academic year.
The highest concentration of universities is found in the Community of Madrid, followed by the autonomous community of Catalonia.
The university that concentrates the highest number of enrollments during the 2020-2021 academic year is the UNED (National University of Distance Education) with 146,208 enrollments, followed by the Complutense University of Madrid with 57,308 registrations and the University of Seville with 52,156.
The most demanded degree in the 2020-2021 academic year is the Degree in Law with 82,552 students nationwide, followed by the Degree in Psychology with 75,738 students and with hardly any difference, the Degree in Business Administration and Management with 74,284 students.
The branch of education with the highest concentration of students is Social and Legal Sciences, while the least demanded is the branch of Sciences.
The nationalities that have the most representation in the Spanish university are from the region of the European Union, followed by the countries of Latin America and the Caribbean, at the expense of the Spanish one.
The age range between 18 and 21 years is the most represented in the student body of Spanish universities.

We hope that this step-by-step visualization has been useful for learning some very common techniques in the treatment and representation of open data. We will return to show you new reuses. See you soon!

22/04/2022

APIS for accessing and downloading tourism data

Blog

Spain was the second country in the world that received the most tourists during 2019, with 83.8 million visitors. That year, tourism activity represented 12.4% of GDP, employing more than 2.2 million people (12.7% of the total). It is therefore a fundamental sector for our economy.

These figures have been reduced due to the pandemic, but the sector is expected to recover in the coming months. Open data can help. Up-to-date information can bring benefits to all actors involved in this industry:

Tourists: Open data helps tourists plan their trips, providing them with the information they need to choose where to stay or what activities to do. The up-to-date information that open data can provide is particularly important in times of COVID. There are several portals that collect information and visualisations of travel restrictions, such as the UN's Humanitarian Data Exchange. This website hosts a daily updated interactive map of travel restrictions by country and airline.
Businesses. Businesses can generate various applications targeted at travellers, with useful information. In addition, by analysing the data, tourism establishments can detect untapped markets and destinations. They can also personalise their offers and even create recommendation systems that help to promote different activities, with a positive impact on the travellers' experience.
Public administrations. More and more governments are implementing solutions to capture and analyse data from different sources in real time, in order to better understand the behaviour of their visitors. Examples include Segovia, Mallorca and Gran Canaria. Thanks to these tools, they will be able to define strategies and make informed decisions, for example, aimed at avoiding overcrowding. In this sense, tools such as Affluences allow them to report on the occupation of museums, swimming pools and shops in real time, and to obtain predictions for successive time slots.

The benefits of having quality tourism-related data are such that it is not surprising that the Spanish Government has chosen this sector as a priority when it comes to creating data spaces that allow voluntary data sharing between organisations. In this way, data from different sources can be cross-referenced, enriching the various use cases.

The data used in this field are very diverse: data on consumption, transport, cultural activities, economic trends or even weather forecasts. But in order to make good use of this highly dynamic data, it needs to be available to users in appropriate, up-to-date formats and access needs to be automated through application programming interfaces (APIs).

Many organisations already offer data through APIs. In this infographic you can see several examples linked to our country at national, regional and local level. But in addition to general data portals, we can also find APIs in open data platforms linked exclusively to the tourism sector. In the following infographic you can see several examples:

Click here to see the infographic in full size and in its accessible version.

Do you know more examples of APIs or other resources that facilitate access to tourism-related open data? Leave us a comment or write to datos.gob.es!

Content prepared by the datos.gob.es team.

31/03/2022

What's new in the open data ecosystem (winter 2021-2022)

Noticia

The end of winter is approaching and, with the change of season, comes the time to compile the main developments of the last three months.

Autumn ended with the approval of Royal Decree-Law 24/2021, which includes the transposition of the European Directive on open data and re-use of public sector information, and now we end the winter with another regulatory advance, this time at European level: the publication of the draft Regulation on the establishment of harmonised rules on access and fair use of data (Data Act), applicable to all economic sectors and which proposes new rules on who can use and access data generated in the EU and under what conditions.

These regulatory developments have been joined by many others in the area of openness and re-use of data. In this article we highlight some examples.

Public data and disruptive technologies

The relationship between open data and new technologies is increasingly evident through various projects that aim to generate improvements in society. Universities are a major player in this field, with innovative projects such as:

The Universitat Oberta de Catalunya (UOC) has launched the project "Bots for interacting with open data - Conversational interfaces to facilitate access to public data". Its aim is to help citizens improve their decision-making through access to data, as well as to optimise the return on investment of open data projects.
The UOC has also launched, together with the Universitat Politècnica de València (UPV) and the Universitat Politècnica de Catalunya (UPC), OptimalSharing@SmartCities, which optimises the use of car sharing in cities through intelligent algorithms capable of processing large amounts of data. They are currently working with data from the Open Data BCN initiative.
Researchers from the University of Cantabria are participating in the SALTED project, aimed at reusing open data from IoT devices, the web and social networks. The aim is to transform them into "useful" information in fields such as urban management or agriculture.

Public bodies are also increasingly harnessing the potential of open data to implement solutions that optimise resources, boost efficiency and improve the citizen experience. Many of these projects are linked to smart cities.

The Cordoba Provincial Council's 'Enlaza, Cordoba Smart Municipalities' project seeks to intelligently manage municipal electricity supplies. A proprietary software has been developed and different municipal facilities have been sensorised with the aim of obtaining data to facilitate decision-making. Among other issues, the province's infrastructures will be used to incorporate a platform that favours the use of open data.
The eCitySevilla and eCityMálaga projects have brought together 90 public and private entities to promote a smart city model at the forefront of innovation and sustainability. Among other issues, they will integrate open data, renewable energies, sustainable transport, efficient buildings and digital infrastructures.
One area where data-driven solutions have a major impact is in tourism. In this sense, the Segovia Provincial Council has created a digital platform to collect tourism data and adjust its proposals to the demands of visitors. The visualisation of updated data will be obtained in real time and will make it possible to learn more about the tourism behaviour of visitors.
For its part, the Consell de Mallorca has set up a Sustainable Tourism Observatory that will provide permanently updated information to define strategies and make decisions based on real data.

To boost the use of data analytics in public bodies, the Andalusian Digital Agency has announced the development of a unit to boost Big Data. Its aim is to provide data analytics-related services to different Andalusian government agencies.

Other examples of open data re-use

Open data is also increasingly in demand by journalists for so-called data journalism. This is especially noticeable in election periods, such as the recent elections to the Castilla y León parliament. Re-users such as Maldita or EPData have taken advantage of the open data offered by the Junta to create informative pieces and interactive maps to bring information closer to the citizens.

Public bodies themselves also take advantage of visualisations to bring data to the public in a simple way. As an example, the map of the National Library of Spain with the Spanish authors who died in 1941, whose works become public domain in 2022 and, therefore, can be edited, reproduced or disseminated publicly.

Another example of the reuse of open data can be found in the Fallas of Valencia. In addition to the classic ninot, this festival also has immaterial fallas that combine tradition, technology and scientific dissemination. This year, one of them consists of an interactive game that uses open data from the city to pose various questions.

Open data platforms are constantly being upgraded

During this season we have also seen many public bodies launching new portals and tools to facilitate access to data, and thus its re-use. Likewise, catalogues have been constantly updated and the information offered has been expanded. The following are some examples:

Sant Boi Town Council has recently launched its digital platform Open Data. It is a space that allows users to explore and download open data from the City Council easily, free of charge and without restrictions.
As part of its Smart City project, Alcoi City Council has set up a website with open data on traffic and environmental indicators. Here you can consult data on air quality, sound pressure, temperature and humidity in different parts of the city.
The Castellón Provincial Council has developed an intuitive and easy-to-use tool to facilitate and accompany citizens' requests for access to public information. It has also updated the information on infrastructures and equipment of the municipalities of Castellón on the provincial institution's data portal in geo-referenced formats, which facilitates its reuse.
The Institute of Statistics and Cartography of Andalusia (IECA) has updated the data tables offered through its BADEA databank. Users can now sort and filter all the information with a single click. In addition, a new section has been created on the website aimed at reusers of statistical information. Its aim is to make the data more accessible and interoperable.
IDECanarias has published an orthophoto of La Palma after the volcanic eruption. It can be viewed through the GRAFCAN viewer. It should be noted that open data has been of great importance in analysing the damage caused by the lava on the island.
GRAFCAN has also updated the Territorial Information System of the Canary Islands (SITCAN), incorporating 25 new points of interest. This update facilitates the location of 36,753 points of interest in the archipelago through the portal of the Spatial Data Infrastructure of the Canary Islands (IDECanarias).
Barcelona Provincial Council offers a new service for downloading open geographic data, with free and open access, through the IDEBarcelona geoportal. This service was presented through a session (in Catalan), which was recorded and is available on Youtube.
The municipal GIS of the City Council of Cáceres has made available to citizens the free download of the updated cartography of the city of Extremadura in different formats such as DGN, DWG, SHP or KMZ.

New reports, guides, courses and challenges

These three months have also seen the publication and launch of resources and activities aimed at promoting open data:

The Junta de Castilla y León has published the guide "Smart governance: the challenge of public service management in local government", which includes tools and use cases to boost efficiency and effectiveness through participation and open data.
The Spanish National Research Council (CSIC) has updated the ranking of open access portals and repositories worldwide in support of Open Science initiatives. This ranking is based on the number of papers indexed in the Google Scholar database.
The commitment of local councils to open data is also evident in the implementation of training initiatives and internal promotion of open data. In this sense, L'Hospitalet City Council has launched two new internal tools to promote the use and dissemination of data by municipal employees: a data visualisation guide and another on graphic guidelines and data visualisation style.
Along the same lines, the Spanish Federation of Municipalities and Provinces (FEMP) has launched the Course on Open Data Treatment and Management in Local Entities, which will be held at the end of March (specifically on 22, 24, 29 and 31 March 2022). The course is aimed at technicians without basic knowledge of Local Entities.

Multiple competitions have also been launched, which aim to boost the use of data, especially at university level, for example The Generalitat Valenciana with POLIS, a project in which students from three secondary schools will learn the importance of understanding and analysing public policies using open data available in public bodies.

In addition, the registration period for the IV Aporta Challenge, focused on the field of Health and Wellbeing, ended this winter. Among the proposals received we find predictive models that allow us to know the evolution of diseases or algorithms that cross-reference data and determine healthy habits.

Other news of interest in Europe

In addition to the publication of the Data Act, we have also seen other developments in Europe:

JoinUp has published the latest version of the metadata application profile specification for open data portals in Europe, DCAT-AP 2.1.
In line with the European Data Strategy, the European Commission has published a working document with its overview of common European data spaces.
The Publications Office of the European Union has launched the sixth edition of the EU Datathon. In this competition, participants have to use open data to address one of four proposed challenges.
The EU Science Hub has published a report presenting examples of use cases linked to data sharing. They explore emerging technologies and tools for data-driven innovation.

These are just some examples of the latest developments in the open data ecosystem in Spain and Europe. If you want to share a project or news with us, leave us a comment or write to dinamizacion@datos.gob.es.

15/03/2022

Predictive models, linked data or geospatial intelligence, examples of the 38 proposals in the 4th Aporta Challenge

Noticia

The deadline for receiving applications to participate in the IV Aporta Challenge closed on 15 February. In total, 38 valid proposals were received in due time and form, all of high quality, whose aim is to promote improvements in the health and well-being of citizens through the reuse of data offered by public administrations for their reuse.

Disruptive technologies, key to extracting maximum value from data

According to the competition rules, in this first phase, participants had to present ideas that identified new opportunities to capture, analyse and use data intelligence in the development of solutions of all kinds: studies, mobile applications, services or websites.

All the ideas seek to address various challenges related to health and wellbeing, many of which have a direct impact on our healthcare system, such as improving the efficiency of services, optimising resources or boosting transparency. Some of the areas addressed by participants include pressure on the health system, diagnosis of diseases, mental health, healthy lifestyles, air quality and the impact of climate change.

Many of the participants have chosen to use disruptive technologies to address these challenges. Among the proposals, we find solutions that harness the power of algorithms to cross-reference data and determine healthy habits or predictive models that allow us to know the evolution of diseases or the situation of the health system. Some even use gamification techniques. There are also a large number of solutions aimed at bringing useful information to citizens, through maps or visualisations.

Likewise, the specific groups at which the solutions are aimed are diverse: we find tools aimed at improving the quality of life of people with disabilities, the elderly, children, individuals who live alone or who need home care, etc.

Proposals from all over Spain and with a greater presence of women

Teams and individuals from all over Spain have been encouraged to participate in the Challenge. We have representatives from 13 Autonomous Communities: Madrid, Catalonia, the Basque Country, Andalusia, Valencia, the Canary Islands, Galicia, Aragon, Extremadura, Castile and Leon, Castile-La Mancha, La Rioja and Asturias.

25% of the proposals were submitted by individuals and 75% by multidisciplinary teams made up of various members. The same distribution is found between individuals (75%) and legal entities (25%). In the latter category, we find teams from universities, organisations linked to the Public Administration and different companies.

It is worth noting that in this edition the number of women participants has increased, demonstrating the progress of our society in the field of equality. Two editions ago, 38% of the proposals were submitted by women or by teams with women members. Now that number has risen to 47.5%. While this is a significant improvement, there is still work to be done in promoting STEM subjects among women and girls in our country.

Jury deliberation begins

Once the proposals have been accepted, it is time for the jury's assessment, made up of experts in the field of innovation, data and health. The assessment will be based on a series of criteria detailed in the rules, such as the overall quality and clarity of the proposed idea, the data sources used or the expected impact of the proposed idea on improving the health and well-being of citizens.

The 10 proposals with the best evaluation will move on to phase II, and will have a minimum of two months to develop the prototype resulting from their idea. The proposals will be presented to the same jury, which will score each project individually. The three prototypes with the highest scores will be the winners and will receive a prize of 5,000, 4,000 and 3,000 euros, respectively.

Good luck to all participants!

09/03/2022

Open data to fight against gender violence

Blog

Today, no one can deny that open data holds great economic power. The European Commission itself estimates that the turnover of open data in the EU27 could reach 334.2 billion in 2025, driven by its use in areas linked to disruptive technologies such as artificial intelligence, machine learning or language technologies.

But in addition to its economic impact, open data also has an important value for society: it provides information that makes social reality visible, driving informed decision-making for the common good.

There are thousands of areas where open data is essential, from refugee crises to the inclusion of people with disabilities, but in this article we will focus on the scourge of gender violence.

Where can I obtain data on the subject?

Globally, agencies such as the UN, the World Health Organization and the World Bank offer resources and statistics related to violence against women.

In our country, local, autonomous and state agencies publish related datasets. To facilitate unified access to them, the Government Delegation against Gender Violence has a statistical portal that includes in a single space data from various sources such as the Ministry of Finance and Public Administration, the General Council of the Judiciary or the Public Employment Service of the Ministry of Employment and Social Security. The user can cross-reference variables and create tables and graphs to facilitate the visualization of the information, as well as export the data sets in CSV or Excel format.

Projects to raise awareness and visibility

But data alone can be complicated to understand. Data need a context that gives them meaning and transforms them into information and knowledge. This is where different projects arise that seek to bring data to the public in a simple way.

There are many associations and organizations that take advantage of published data to create visualizations and stories with data that help to raise awareness about gender violence. As an example, the Barcelona Open Data Initiative is developing the "DatosXViolenciaXMujeres" project. It is a visual and interactive tour on the impact of gender violence in Spain and by Autonomous Communities during the period 2008-2020, although it is updated periodically. Using data storytelling techniques, it shows the evolution of gender violence within the couple, the judicial response (orders issued and final convictions), the public resources allocated, the impact of COVID-19 in this area and crimes of sexual violence. Each graph includes links to the original source and to places where the data can be downloaded so that they can be reused in other projects.

Another example is "Datos contra el ruido” (Data against noise), developed within the framework of GenderDataLab, a collaborative platform for the digital common good that has the support of various associations, such as Pyladies or Canodron, and the Barcelona City Council, among others. This association promotes the inclusion of the gender perspective in the collection of open data through various projects such as the aforementioned "Dotos contra el ruido", which makes visible and understandable the information published by the judicial system and the police on gender violence. Through data and visualizations, it provides information on the types of crimes or their geographical distribution throughout our country, among other issues. As with "DatosXViolenciaXMujeres", a link to the original source of the data and download spaces are included.

Tools and solutions to support victims

But in addition to providing visibility, open data can also give us information on the resources dedicated to helping victims, as we saw in some of the previous projects. Making this information available to victims in a quick and easy way is essential. Maps showing the location of help centers are of great help, such as this one from the SOL.NET project, with information on organizations that offer support and care services for victims of gender-based violence in Spain. Or this one with the centers and social services of the Valencian Community aimed at disadvantaged groups, including victims of gender violence, prepared by the public institution itself.

This information is also incorporated in applications aimed at victims, such as Anticípate. This app not only provides information and resources to women in vulnerable situations, but also has an emergency call button and allows access to legal, psychological or even self-defense advice, facilitating access to a social criminologist.

In short, we are facing a particularly sensitive issue, which we must continue to raise awareness and fight to put an end to. A task to which open data can make a significant contribution.

If you know of any other example that shows the power of open data in this field, we encourage you to share it in the comments section or send us an email to dinamizacion@datos.gob.es.

Content prepared by the datos.gob.es team.

24/02/2022

Reinforcement learning: AI solutions that learn without historical data

Blog

Today, Artificial Intelligence (AI) applications are present in many areas of everyday life, from smart TVs and speakers that are able to understand what we ask them to do, to recommendation systems that offer us services and products adapted to our preferences.

These AIs "learn" thanks to various techniques, including supervised, unsupervised and reinforcement learning. In this article we will focus on reinforcement learning, which focuses mainly on trial and error, similar to how humans and animals in general learn.

The key to this type of system is to correctly set long-term goals in order to find an optimal global solution, without focusing too much on immediate rewards, which do not allow for an adequate exploration of the set of possible solutions.

Simulation environments as a complement to open data sets.

Unlike other types of learning, where learning is usually based on historical datasets, this type of technique requires simulation environments that allow training a virtual agent through its interaction with an environment, where it receives rewards or penalties depending on the state and actions it performs. This cycle between agent and environment can be seen in the following diagram:

Esquema de aprendizaje por refuerzo, donde se muestra como un agente realiza una acción sobre el entorno. Según el resultado, obtiene una recompensa o una penalización.

Figure 1 - Scheme of learning by reinforcement [Sutton & Barto, 2015]

That is, starting from a simulated environment, with an initial state, the agent performs an action that generates a new state and a possible reward or penalty, which depends on the previous states and the action performed. The agent learns the best strategy in this simulated environment from experience, exploring the set of states, and being able to recommend the best action policy if configured appropriately.

The best-known example worldwide was the success achieved by AlphaGo, beating 18-time world champion Lee Sedol in 2016. Go is an ancient game, considered one of the 4 basic arts in Chinese culture, along with music, painting and calligraphy. Unlike chess, the number of possible game combinations is greater than the number of atoms in the Universe, being a problem impossible to solve by traditional algorithms.

Curiously, the technological breakthrough demonstrated by AlphaGo in solving a problem that was claimed to be beyond the reach of an AI, was eclipsed a year later by its successor AlphaGo Zero. In this version, its creators chose not to use historical data or heuristic rules. AlphaGo Zero only uses the board positions and learns by trial and error by playing against itself.

Following this innovative learning strategy, in 3 days of execution he managed to beat AlphaGo, and after 40 days he became the best Go player, accumulating thousands of years of knowledge in a matter of days, and even discovering previously unknown strategies.

The impact of this technological milestone covers countless areas, and AI solutions that learn to solve complex problems from experience can be counted on. From resource management, strategy planning, or the calibration and optimization of dynamic systems.

The development of solutions in this area is especially limited by the need for appropriate simulation environments, being the most complex component to build. However, there are multiple repositories to obtain open simulation environments that allow us to test this type of solutions.

The best known reference is Open AI Gym, which includes an extensive set of libraries and open simulation environments for the development and validation of reinforcement learning algorithms. Among others, it includes simulators for the basic control of mechanical elements, robotics applications and physics simulators, two-dimensional ATARI video games, and even the landing of a lunar module. In addition, it allows to integrate and publish new open simulators for the development of our own simulators adapted to our needs that can be shared with the community:

capturas de entornos visuales de simulación ofrecidos por Open AI Gym

Figure 2 - Examples of visual simulation environments offered by Open AI Gym

Another interesting reference is Unity ML Agents, where we also find multiple libraries and several simulation environments, also offering the possibility of integrating our own simulator:

Capturas de entornos visuales de simulación ofrecidos por Unity ML Agents

Figure 3 - Examples of visual simulation environments offered by Unity ML Agents

Potential applications of reinforcement learning in public administrations

This type of learning is used especially in areas such as robotics, resource optimization or control systems, allowing the definition of optimal policies or strategies for action in specific environments.

One of the best-known practical examples is the DeepMind algorithm used by Google to reduce by 40% the energy consumption required to cool its data centers in 2016, achieving a significant reduction in energy consumption during use, as can be seen in the following graph (taken from the previous article):

Gráfica que muestra la reducción del consumo de energía, resultados del algoritmo de DeepMind sobre el consumo energético de los centros de datos de Google

Figure 4 - Results of the DeepMind algorithm on the energy consumption of Google's data centers.

The algorithm employed uses a combination of deep learning and reinforcement learning techniques, together with a general purpose simulator to understand complex dynamic systems that could be applied in multiple environments such as transformation between energy types, water consumption or resource optimization in general.

Other possible applications in the public domain include the search and recommendation of open datasets through chatbots, or the optimization of public policies, as is the case of the European project Policy Cloud, applied for example in the analysis of future strategies of the different designations of origin of wines from Aragon.

In general, the application of this type of techniques could optimize the use of public resources by planning action policies that result in more sustainable consumption, reducing pollution, waste and public spending.

Content prepared by Jose Barranquero, expert in Data Science and Quantum Computing.

The contents and views expressed in this publication are the sole responsibility of the author.

17/02/2022

What's new in the open data ecosystem (autumn 2021)

Noticia

Autumn is coming to an end and, as every time we change season, we would like to summarize some of the main news and developments of the last three months.

One of the main advances of open data in our country has occurred in the legislative field, with the approval of the transposition of the European Directive on open data and reuse of public sector information. It has been included in the Royal Decree-Law 24/2021, validated last December 2 by the Congress. You can read about the new features here.

It is expected that, under the protection of this regulation, the Spanish open data ecosystem will continue to grow, as shown by the new developments in recent months. In the case of datos.gob.es, we have reached 160 public administrations publishing data this fall, exceeding 50,000 datasets accessible from the National Catalog. In addition, many organizations and reusers have launched new projects linked to open data, as we will see below.

La Palma volcano, an example of the value of public data

This autumn will be remembered in our country for the eruption of the volcano on the island of La Palma. A situation that has highlighted the importance of the publication and use of open data for the management of natural emergencies.

The open data portal of La Palma has offered -and offers- updated information about the eruption in Cumbre Vieja. Data on perimeters, photogrammetries, thermographs or terrain models can be consulted and downloaded from its website. In addition, the Cabildo Insular has created a unified point to collect all the information of interest in real time in a simple way. It is also important the data and tools made available to the public by the National Geographic Institute. In addition, the Copernicus Earth Observation Program offers data and maps of interest, as well as management support.

All these data have allowed the development of 3D viewers and tools to compare the situation prior to the eruption with the current one, which are very useful to understand the magnitude of the event and make decisions accordingly. The data have also been used by the media and reusers to create visuals to help transmit information to the public, such as this visual tour of the tongue of fire from the recording of the seismic activity to its arrival at the sea or this animation that shows in just 30 seconds, the 5,000 seismic movements recorded in La Palma to date.

Growing use of open data and new technologies

In addition to emergency management, open data is also increasingly being used by public agencies to improve the efficiency and effectiveness of their activities, often in conjunction with disruptive technologies such as artificial intelligence. Some examples are shown below:

The Murcia City Council has announced that it is developing new sustainable mobility management models using data from the Copernicus Earth Observation Program. The information obtained will make it possible to offer new intelligent mobility services oriented to citizens, companies and public administrations of the municipality.
Las Palmas de Gran Canaria has presented a Sustainable Tourism Intelligence System. It is a digital tool that provides updated data from multiple sources for decision-making and improving competitiveness, both for companies and for the tourist destination itself.
The Vigo City Council plans to create a 3D model of the entire city, combining open data with geographic data. This action will be used to develop elements such as noise, pollution and traffic maps, among others.
The Junta de Castilla y León is working on the Bision Project, a Business Intelligence system for better decision making in the field of health. The new system would automatically allow the development of instruments for evaluating the quality of the healthcare system. It is worth mentioning that the Junta de Castilla y León has received an award for the quality and innovation of its transparency portal during the pandemic thanks to its open data platform.
The massive Artificial Intelligence system of the Spanish language MarIA, promoted by the State Secretariat for Digitalization and Artificial Intelligence, is making progress in its development. Based on the web archives of the Spanish National Library, a new version has been created that allows summarizing existing texts and creating new ones based on headlines or words.

What's new in open data platforms

In order to continue developing this type of projects, it is essential to continue promoting access to quality data and tools that facilitate their exploitation. In this regard, some of the new developments are:

The Government of Navarra has presented a new portal in a conference at the University of Navarra, with the participation of the Aporta initiative. Cordoba, which has also approved the implementation of a new open data and transparency portal, will soon follow.
Aragon's open data portal has released a new version of its API, the GA_OD_Core service. Its aim is to offer citizens and developers the ability to access the data offered on the portal and integrate them into different apps and services through REST architecture. In addition, Aragón has also presented a new virtual assistant that facilitates the location and access to data. It is a chatbot that provides answers based on the data it contains, with a conversational level that can be understood by the receiver.
The Ministry of Transport, Mobility and Urban Agenda has published a viewer to consult the transport infrastructures belonging to the Trans-European Transport Network in Spain (TEN-T). The viewer allows the downloading of open data of the network, as well as projects co-financed with CEF funds.
The Madrid City Council has presented Cibeles+, an artificial intelligence project to facilitate access to urban planning information. Using natural language processing techniques and machine learning, the system responds to complex urban planning questions and issues through Alexa and Twitter.
The geoportal of the IDE Barcelona has launched a new open geographic data download service with free and open access. Among the information published in vector format, the topographic cartography (scale 1:1000) stands out. It includes urban and developable areas and sectors of interest in undeveloped land.
The Valencia City Council has developed a data inventory to measure efficiency in the implementation of public policies. This tool will also allow citizens to access public information more quickly.

It should also be noted that the Pinto City Council has confirmed its adherence to the principles of the international Open Data Charter (ODC) with the commitment to improve open data policies and governance.

Boosting reuse and data-related capabilities

Public bodies have also launched various initiatives to promote the use of data. Among them, datos.gob.es launched at the end of November the fourth edition of the Aporta Challenge, focused on the field of health and welfare. This seeks to identify and recognize new ideas and prototypes that drive improvements in this field, using open data from public bodies.

This season we also met the winners of the V Open Data Contest organized by the Junta de Castilla y León. Of the 37 applications received, a jury of experts in the field has chosen 8 projects that have emerged as winners in the various categories.

Also, increasingly popular are the courses and seminars that are launched to increase the acquisition of data-related knowledge. Here are two examples:

The City Council of L'Hospitalet de Llobregat has launched a training program on the use of data for municipal workers. This plan is structured in 22 different courses that will be taught until May next year.
The Open Cities project has delivered a cycle of workshops related to open data in Smart cities. The complete video is available at this link.

Other news of interest in Europe

At the European level, we have ended the autumn with two major actions: the publication of the Open Data Maturity Index 2021, prepared by the European Data Portal, and the celebration of the EU Open Data Days. In the first one, it should be noted that Spain is in third position and is once again among the leading countries in open data in Europe. For its part, the EU Open Data Days were made up of the EU DataViz 2021 conference and the final of the EU Datathon 2021, where the Spanish company CleanSpot came second in its category. This app raises awareness and encourages recycling and reuse of products through gamification.

The European portal has also launched the Open Data Academy, with all available courses structured around four themes: policy, impact, technology and quality (the same as those assessed by the aforementioned maturity index). The curriculum is constantly updated with new materials.

Other new features include:

The DCAT Application Profile for Data Portals in Europe (DCAT-AP) has been updated. The preliminary version of DCAT-AP version 2.1.0 was available for public review between October and November 2021.
Asedie has been selected by the Global Data Barometer (GBD) and Access Info Europe, as "Country Researcher" for the elaboration of the 1st edition of the Global Data Barometer 20-21. This is a research project that analyses how data is managed, shared and used for the common good.
The OpenCharts map catalog of European Centre for Medium-Range Weather Forecasts, which offers hundreds of open charts, has been updated. This RAM magazine article tells you what's new.
The UK government has launched one of the world's first national standards for algorithmic transparency. This follows commitments made in its National Artificial Intelligence Strategy and its National Data Strategy.

Do you know of other examples of projects related to open data? Leave us a comment or write to dinamizacion@datos.gob.es.

16/12/2021