Collecting and analysing data to improve humanitarian assistance and restore damage during the Ukrainian war

Blog

On 24 February Europe entered a scenario that not even the data could have predicted: Russia invaded Ukraine, unleashing the first war on European soil so far in the 21st century.

Almost five months later, on 26 September, the United Nations (UN) published its official figures: 4,889 dead and 6,263 wounded. According to the official UN data, month after month, the reality of the Ukrainian victims was as follows:

Date	Deceased	Injured
24-28 February	336	461
March	3028	2384
April	660	1253
May	453	1012
Jun	361	1029
1-3 july	51	124

According to data extracted by the mission that the UN High Commissioner for Human Rights has been carrying out in Ukraine since Russia invaded Crimea in 2014, the total number of civilians displaced as a result of the conflict is more than 7 million people.

However, as in other areas, the data serve not only to develop solutions, but also to gain an in-depth understanding of aspects of reality that would otherwise not be possible. In the case of the war in Ukraine, the collection, monitoring and analysis of data on the territory allows organisations such as the United Nations to draw their own conclusions.

With the aim of making visible how data can be used to achieve peace, we will now analyse the role of data in relation to the following tasks:

Prediction

In this area, data are used to try to anticipate situations and plan an appropriate response to the anticipated risk. Whereas before the outbreak of war, data was used to assess the risk of future conflict, it is now being used to establish control and anticipate escalation.

For example, satellite images provided by applications such as Google Maps have made it possible to monitor the advance of Russian troops. Similarly, visualisers such as Subnational Surge Tracker identify peaks of violence at different administrative levels: states, provinces or municipalities.

Information

It is just as important to know the facts in order to prevent violence as it is to use them to limit misinformation and communicate the facts objectively, truthfully and in line with official figures. To achieve this, fact-checking applications have begun to be used, capable of responding to fake news with official data.

Among them is Newsguard, a verification entity that has developed a tracker that gathers all the websites that share disinformation about the conflict, placing special emphasis on the most popular false narratives circulating on the web. It even catalogues this type of content according to the language in which it is promoted.

Material damage

The data can also be used to locate material damage and track the occurrence of new damage. Over the past months, the Russian offensive has damaged the Ukrainian public infrastructure network, rendering roads, bridges, water and electricity supplies, and even hospitals unusable.

Data on this reality is very useful for organising a response aimed at reconstructing these areas and sending humanitarian assistance to civilians who have been left without services.

In this sense, we highlight the following use cases:

The United Nations Development Programme''s (UNDP) machine learning algorithm has been developed and improved to identify and classify war-damaged infrastructure.
In parallel, the HALO Trust uses social media mining capable of capturing information from social media, satellite imagery and even geographic data to help identify areas with ''explosive remnants''. Thanks to this finding, organisations deployed across the Ukrainian terrain can move more safely to organise a coordinated humanitarian response.
The light information captured by NASA satellites is also being used to build a database to help identify areas of active conflict in Ukraine. As in the previous examples, this data can be used to track and send aid to where it is most needed.

Human rights violations and abuses

Unfortunately, in such conflicts, violations of the human rights of the civilian population are the order of the day. In fact, according to experience on the ground and information gathered by the UN High Commissioner for Human Rights, such violations have been documented throughout the entire period of war in Ukraine.

In order to understand what is happening to Ukrainian civilians, monitoring and human rights officers collect data, public information and first-person accounts of the war in Ukraine. From this, they develop a mosaic map that facilitates decision-making and the search for just solutions for the population.

Another very interesting work developed with open data is carried out by Conflict Observatory. Thanks to the collaboration of analysts and developers, and the use of geospatial information and artificial intelligence, it has been possible to discover and map war crimes that might otherwise remain invisible.

Migratory movements

Since the outbreak of war last February, more than 7 million Ukrainians have fled the war and thus their own country. As in previous cases, data on migration flows can be used to bolster humanitarian efforts for refugees and IDPs.

Some of the initiatives where open data contributes include the following:

The Displacement Tracking Matrix is a project developed by the International Organization for Migration and aimed at obtaining data on migration flows within Ukraine. Based on the information provided by approximately 2,000 respondents through telephone interviews, a database was created and used to ensure the effective distribution of humanitarian actions according to the needs of each area of the country

Humanitarian response

Similar to the analysis carried out to monitor migratory movements, the data collected on the conflict also serves to design humanitarian response actions and track the aid provided.

In this line, one of the most active actors in recent months has been the United Nations Population Fund (UNFPA), which created a dataset containing updated projections by gender, age and Ukrainian region. In other words, thanks to this updated mapping of the Ukrainian population, it is much easier to think about what needs each area has in terms of medical supplies, food or even mental health support.

Another initiative that is also providing support in this area is the Ukraine Data Explorer, an open source project developed on the Humanitarian Data Exchange (HDX) platform that provides collaboratively collected information on refugees, victims and funding needs for humanitarian efforts.

Finally, the data collected and subsequently analysed by Premise provides visibility on areas with food and fuel shortages. Monitoring this information is really useful for locating the areas of the country with the least resources for people who have migrated internally and, in turn, for signalling to humanitarian organisations which areas are most in need of assistance.

Innovation and the development of tools capable of collecting data and drawing conclusions from it is undoubtedly a major step towards reducing the impact of armed conflict. Thanks to this type of forecasting and data analysis, it is possible to respond quickly and in a coordinated manner to the needs of civil society in the most affected areas, without neglecting the refugees who are displaced thousands of kilometres from their homes.

We are facing a humanitarian crisis that has generated more than 12.6 million cross-border movements. Specifically, our country has attended to more than 145,600 people since the beginning of the invasion and more than 142,190 applications for temporary protection have been granted, 35% of them to minors. These figures make Spain the fifth Member State with the highest number of favourable temporary protection decisions. Likewise, more than 63,500 displaced persons have been registered in the National Health System and with the start of the academic year, there are 30,919 displaced Ukrainian students enrolled in school, of whom 28,060 are minors..

Content prepared by the datos.gob.es team.

05/10/2022

Open the deadline to participate in the Datathon Open Data Bizkaia

Evento

Do you accept the challenge of transforming Bizkaia from its open data? This is the "Datathon Open Data Bizkaia", a collaborative development competition organised by Lantik and the Provincial Council of Bizkaia.

Participants will have to create the mockup of an application that helps to solve problems affecting the citizens of Bizkaia. To do so, they will have to use at least one dataset from among all those available on the Open Data Bizkaia portal. These datasets may be combined with data from other sources.

How does the competition unfold?

The competition will be held in two phases:

First phase. Participating teams must submit a proposal document in PDF format. Among other information, the proposal shall include a brief description of the solution, its functionalities and the datasets used.

Second phase. A jury will evaluate all applications received and that are valid in time and form. Seven finalist proposals will then be selected. The shortlisted teams will have to produce a mockup and a promotional video of maximum 2 minutes, presenting the team members and describing the most outstanding features of the solution.

These phases will be carried out according to the following timetable:

From 19 September to 19 October. Registration period open to submit proposals in pdf format.
26 October. Announcement of shortlisted teams.
14th November. Deadline for submitting the mockup and video.
18th November. The final will be held in Bilbao, although it will also be possible to attend, optionally, online. The videos will be presented and the winning teams will be selected.

Who can participate?

The competition is open to anyone over 16 years of age, regardless of nationality, as long as they have a valid DNI/NIF/NIE, passport or other public document that proves the identity and age of the participant.

You can participate as an individual or in teams of a maximum of six people.

What do the prizes consist of?

Two winners will be chosen from the 7 finalists, who will receive the following prize money:

First prize: €2,500.
Second prize: €1,500.

In addition, the other finalist teams will receive €500.

Para llevar a cabo la valoración, el jurado tomará como referencia una serie de criterios detallados en las bases de la competición: relevancia, reutilización de datos abiertos y aptitud para el propósito.

In order to carry out the assessment, the jury will take as a reference a series of criteria detailed in the competition rules: relevance, reuse of open data and fitness for purpose.

How can I participate?

Participants must upload their proposal to a sharepoint enabled for this purpose. A model document that can be used as a reference can be found on the website.

Beforehand, it is necessary to register using the form on the website. After registration, the team will receive an e-mail with instructions on how to submit the proposal.

The proposal must be submitted before 19 October 2022 at 12h.

Find out more about Open Data Bizkaia

Open Data Bizkaia provides citizens and reusing agents with access to the public information managed by the Provincial Council of Bizkaia. There are currently more than 900 datasets available.

Its website also offers resources for reusers, an API, good practices and examples of applications created with datasets from the portal that can serve to inspire the participants in this competition.

Find out more about Bizkaia's Open Data strategy in this article.

30/09/2022

This is MarIA, the first artificial intelligence in the Spanish language

Blog

After several months of tests and different types of training, the first massive Artificial Intelligence system in the Spanish language is capable of generating its own texts and summarising existing ones. MarIA is a project that has been promoted by the Secretary of State for Digitalisation and Artificial Intelligence and developed by the National Supercomputing Centre, based on the web archives of the National Library of Spain (BNE).

This is a very important step forward in this field, as it is the first artificial intelligence system expert in understanding and writing in Spanish. As part of the Language Technology Plan, this tool aims to contribute to the development of a digital economy in Spanish, thanks to the potential that developers can find in it.

The challenge of creating the language assistants of the future

MarIA-style language models are the cornerstone of the development of the natural language processing, machine translation and conversational systems that are so necessary to understand and automatically replicate language. MarIA is an artificial intelligence system made up of deep neural networks that have been trained to acquire an understanding of the language, its lexicon and its mechanisms for expressing meaning and writing at an expert level.

Thanks to this groundwork, developers can create language-related tools capable of classifying documents, making corrections or developing translation tools.

The first version of MarIA was developed with RoBERTa, a technology that creates language models of the "encoder" type, capable of generating an interpretation that can be used to categorise documents, find semantic similarities in different texts or detect the sentiments expressed in them.

Thus, the latest version of MarIA has been developed with GPT-2, a more advanced technology that creates generative decoder models and adds features to the system. Thanks to these decoder models, the latest version of MarIA is able to generate new text from a previous example, which is very useful for summarising, simplifying large amounts of information, generating questions and answers and even holding a dialogue.

Advances such as the above make MarIA a tool that, with training adapted to specific tasks, can be of great use to developers, companies and public administrations. Along these lines, similar models that have been developed in English are used to generate text suggestions in writing applications, summarise contracts or search for specific information in large text databases in order to subsequently relate it to other relevant information.

In other words, in addition to writing texts from headlines or words, MarIA can understand not only abstract concepts, but also their context.

More than 135 billion words at the service of artificial intelligence

To be precise, MarIA has been trained with 135,733,450,668 words from millions of web pages collected by the National Library, which occupy a total of 570 Gigabytes of information. The MareNostrum supercomputer at the National Supercomputing Centre in Barcelona was used for the training, and a computing power of 9.7 trillion operations (969 exaflops) was required.

Bearing in mind that one of the first steps in designing a language model is to build a corpus of words and phrases that serves as a database to train the system itself, in the case of MarIA, it was necessary to carry out a screening to eliminate all the fragments of text that were not "well-formed language" (numerical elements, graphics, sentences that do not end, erroneous encodings, etc.) and thus train the AI correctly.

Due to the volume of information it handles, MarIA is already the third largest artificial intelligence system for understanding and writing with the largest number of massive open-access models. Only the language models developed for English and Mandarin are ahead of it. This has been possible mainly for two reasons. On the one hand, due to the high level of digitisation of the National Library's heritage and, on the other hand, thanks to the existence of a National Supercomputing Centre with supercomputers such as the MareNostrum 4.

The role of BNE datasets

Since it launched its own open data portal (datos.bne.es) in 2014, the BNE has been committed to bringing the data available to it and in its custody closer: data on the works it preserves, but also on authors, controlled vocabularies of subjects and geographical terms, among others.

In recent years, the educational platform BNEscolar has also been developed, which seeks to offer digital content from the Hispánica Digital Library's documentary collection that may be of interest to the educational community.

Likewise, and in order to comply with international standards of description and interoperability, the BNE data are identified by means of URIs and linked conceptual models, through semantic technologies and offered in open and reusable formats. In addition, they have a high level of standardisation.

Next steps

Thus, and with the aim of perfecting and expanding the possibilities of use of MarIA, it is intended that the current version will give way to others specialised in more specific areas of knowledge. Given that it is an artificial intelligence system dedicated to understanding and generating text, it is essential for it to be able to cope with lexicons and specialised sets of information.

To this end, the PlanTL will continue to expand MarIA to adapt to new technological developments in natural language processing (more complex models than the GPT-2 now implemented, trained with larger amounts of data) and will seek ways to create workspaces to facilitate the use of MarIA by companies and research groups.

Content prepared by the datos.gob.es team.

22/09/2022

Open data in transport and urban mobility research projects

Blog

Open data portals are experiencing a significant growth in the number of datasets being published in the transport and mobility category. For example, the EU's open data portal already has almost 48,000 datasets in the transport category or Spain's own portal datos.gob.es, which has around 2,000 datasets if we include those in the public sector category. One of the main reasons for the growth in the publication of transport-related data is the existence of three directives that aim to maximise the re-use of datasets in the area. The PSI directive on the re-use of public sector information in combination with the INSPIRE directive on spatial information infrastructure and the ITS directive on the implementation of intelligent transport systems, together with other legislative developments, make it increasingly difficult to justify keeping transport and mobility data closed.

In this sense, in Spain, Law 37/2007, as amended in November 2021, adds the obligation to publish open data to commercial companies belonging to the institutional public sector that act as airlines. This goes a step further than the more frequent obligations with regard to data on public passenger transport services by rail and road.

In addition, open data is at the heart of smart, connected and environmentally friendly mobility strategies, both in the case of the Spanish "es.movilidad" strategy and in the case of the sustainable mobility strategy proposed by the European Commission. In both cases, open data has been introduced as one of the key innovation vectors in the digital transformation of the sector to contribute to the achievement of the objectives of improving the quality of life of citizens and protecting the environment.

However, much less is said about the importance and necessity of open data during the research phase, which then leads to the innovations we all enjoy. And without this stage in which researchers work to acquire a better understanding of the functioning of the transport and mobility dynamics of which we are all a part, and in which open data plays a fundamental role, it would not be possible to obtain relevant innovations or well-informed public policies. In this sense, we are going to review two very relevant initiatives in which coordinated multi-national efforts are being made in the field of mobility and transport research.

The information and monitoring system for transport research and innovation

At the European level, the EU also strongly supports research and innovation in transport, aware that it needs to adapt to global realities such as climate change and digitalisation. The Strategic Transport Research and Innovation Agenda (STRIA) describes what the EU is doing to accelerate the research and innovation needed to radically change transport by supporting priorities such as electrification, connected and automated transport or smart mobility.

In this sense, the Transport Research and Innovation Monitoring and Information System (TRIMIS) is the tool maintained by the European Commission to provide open access information on research and innovation (R&I) in transport and was launched with the mission to support the formulation of public policies in the field of transport and mobility.

TRIMIS maintains an up-to-date dashboard to visualise data on transport research and innovation and provides an overview and detailed data on the funding and organisations involved in this research. The information can be filtered by the seven STRIA priorities and also includes data on the innovation capacity of the transport sector.

If we look at the geographical distribution of research funds provided by TRIMIS, we see that Spain appears in fifth place, far behind Germany and France. The transport systems in which the greatest effort is being made are road and air transport, beneficiaries of more than half of the total effort.

Graph showing the geographical distribution of research funds provided by TRIMIS. The top positions are occupied by: Germany, France, Italy, United Kingdom, Spain, Netherlands and Belgium.

However, we find that in the strategic area of Smart Mobility and Services (SMO), which are evaluated in terms of their contribution to the overall sustainability of the energy and transport system, Spain is leading the research effort at the same level as Germany. It should also be noted that the effort being made in Spain in terms of multimodal transport is higher than in other countries.

Graph showing the distribution of Smart Mobility and Services (SMO) funding. Germany is in first place, closely followed by Spain. This is followed by Italy, France, the United Kingdom, Belgium and the Netherlands.

As an example of the research effort being carried out in Spain, we have the pilot dataset to implement semantic capabilities on traffic incident information related to safety on the Spanish state road network, except for the Basque Country and Catalonia, which is published by the General Directorate of Traffic and which uses an ontology to represent traffic incidents developed by the University of Valencia.

The area of intelligent mobility systems and services aims to contribute to the decarbonisation of the European transport sector and its main priorities include the development of systems that connect urban and rural mobility services and promote modal shift, sustainable land use, travel demand sufficiency and active and light travel modes; the development of mobility data management solutions and public digital infrastructure with fair access or the implementation of intermodality, interoperability and sectoral coupling.

The 100 mobility questions initiative

The 100 Questions Initiative, launched by The Govlab in collaboration with Schmidt Futures, aims to identify the world's 100 most important questions in a number of domains critical to the future of humanity, such as gender, migration or air quality.

One of these domains is dedicated precisely to transport and urban mobility and aims to identify questions where data and data science have great potential to provide answers that will help drive major advances in knowledge and innovation on the most important public dilemmas and the most serious problems that need to be solved.

In accordance with the methodology used, the initiative completed the fourth stage on 28 July, in which the general public voted to decide on the final 10 questions to be addressed. The initial 48 questions were proposed by a group of mobility experts and data scientists and are designed to be data-driven and planned to have a transformative impact on urban mobility policies if they can be solved.

In the next stage, the GovLab working group will identify which datasets could provide answers to the selected questions, some as complex as "where do commuters want to go but really can't and what are the reasons why they can't reach their destination easily?" or "how can we incentivise people to make trips by sustainable modes, such as walking, cycling and/or public transport, rather than personal motor vehicles?"

Other questions relate to the difficulties encountered by reusers and have been frequently highlighted in research articles such as "Open Transport Data for maximising reuse in multimodal route": "How can transport/mobility data collected with devices such as smartphones be shared and made available to researchers, urban planners and policy makers?"

In some cases it is foreseeable that the datasets needed to answer the questions may not be available or may belong to private companies, so an attempt will also be made to define what new datasets should be generated to help fill the gaps identified. The ultimate goal is to provide a clear definition of the data requirements to answer the questions and to facilitate the formation of data collaborations that will contribute to progress towards these answers.

Ultimately, changes in the way we use transport and lifestyles, such as the use of smartphones, mobile web applications and social media, together with the trend towards renting rather than owning a particular mode of transport, have opened up new avenues towards sustainable mobility and enormous possibilities in the analysis and research of the data captured by these applications.

Global initiatives to coordinate research efforts are therefore essential as cities need solid knowledge bases to draw on for effective policy decisions on urban development, clean transport, equal access to economic opportunities and quality of life in urban centres. We must not forget that all this knowledge is also key to proper prioritisation so that we can make the best use of the scarce public resources that are usually available to meet the challenges.

Content written by Jose Luis Marín, Senior Consultant in Data, Strategy, Innovation & Digitalization.

The contents and views reflected in this publication are the sole responsibility of the author.

20/09/2022

Open data ecosystem developments (summer 2022)

Noticia

There are only a few days left until the end of summer and, as with every change of season, it is time to review what the last three months have brought in the Spanish open data ecosystem.

In July we learned of the latest edition of the European Commission's DESI (Digital Economy and Society Index) report, which places Spain ahead of the EU average in digital matters. Our country is in seventh position, improving two places compared to 2021. One of the areas where it performs best is in open data, where it ranks third. These good data are the result of the fact that an increasing number of organisations are committed to opening up the information they hold and more reusers are taking advantage of this data to create valuable products and services, as we will see below.

Advances in strategy and agreements to promote open data

Open data is gaining ground in political strategies at national, regional and local level.

In this regard, in July the Council of Ministers approved the draft Act on the Digital Efficiency of the Public Justice Service, an initiative that seeks to build a more accessible Justice Administration, promoting the data orientation of its systems. Among other issues, this act incorporates the concept of "open data" in the Administration of Justice.

Another example, this time at the regional level, comes from the Generalitat de Valencia, which launched a new Open Data Strategy at the beginning of the summer with the aim of offering quality public information, by design and by default.

We have also witnessed the signing of collaboration agreements to boost the open data ecosystem, for example:

The Ajuntament de L'Hospitalet and the Universitat Politècnica de Catalunya have signed an agreement to offer training to undergraduate and master's degree students in Big Data and Artificial Intelligence, based on open data work.
The University of Castilla la Mancha has agreed with the regional government to launch the 'Open Government' chair in order to promote higher education and research in areas such as transparency, open data or access to public information.
The National Centre for Geographic Information (CNIG) and Asedie have signed a new protocol to improve access to geographic information, in order to promote openness, access and reuse of public sector information.

Examples of data reuse

The summer of 2022 will be remembered for the heat waves and fires that have ravaged different corners of the country. A context in which open data has demonstrated its power to provide information on the state of the situation and help in extinguishing fires. Data from Copernicus or the State Meteorological Agency (AEMET) have been used to monitor the situation and make decisions. These data sources, together with others, are also being used to understand the consequences of low rainfall and high temperatures on European reservoirs. In addition, these data have been used by the media to provide the public with the latest information on the evolution of the fires.

Firefighting based on open data has also been developed at the regional level. For example, the Government of Navarre has launched Agronic, a tool that works with Spatial Data Infrastructures of Navarre to prevent fires caused by harvesters. For its part, the Barcelona Provincial Council's open data portal has published datasets with "essential information" for the prevention of forest fires. These include the network of water points, low combustibility strips and forest management actions, used by public bodies to draw up plans to deal with fire.

Other examples of the use of open data that we have seen during this period are:

The Environmental Radiological Surveillance Network of the Generalitat de Catalunya has developed, from open data, a system to monitor the radiation present in the environment of the nuclear power plants (Vandellòs and Ascó) and the rest of the Catalan territory.
Thanks to the open data shared by Aragón Open Data, a new scientific article on Covid-19 has been written with the aim of finding out and identifying spatio-temporal patterns in relation to the incidence of the virus and the organisation of health resources.
The Barcelona Open Data initiative has launched #DataBretxaWomen, a project that seeks to raise public awareness of the existing inequality between men and women in different sectors.
Maldito dato has used open data from the statistics developed by the National Statistics Institute (INE) based on mobile positioning data to show how the population density of different Spanish municipalities changes during July and August.
Within its Data Analytics for Research and Innovation in Health Programme, Catalonia has prioritised 8 proposals for research based on data analysis. These include studies on migraines, psychosis and heart disease.

Developments in open data platforms

Summer has also been the time chosen by different organisations to launch or update their open data platforms. Examples include:

The Statistical Institute of Navarre launched a new web portal, with more dynamic and attractive visualisations. In the process of creation, they have managed to automate statistical production and integrate all the data in a single environment.
Zaragoza City Council has also just published a new open data portal that offers all municipal information in a clearer and more concise way. This new portal has been agreed with other city councils as part of the 'Open Cities' project.
Another city that already has an open data portal is Cadiz. Its City Council has launched a platform that will allow the people of Cadiz to know, access, reuse and redistribute the open data present in the city.
The Valencian Institute of Business Competitiveness (IVACE) presented an open data portal with all the records of energy certification of buildings in the Valencian Community since 2011. This will allow, among other actions, to carry out consumption analysis and establish rehabilitation strategies.
Aragón Open Data has included a new functionality in its API that allows users to obtain geographic data in GeoJSON format.
The National Geographic Institute announced a new version of the earthquake app, with new features, educational content and information.
The Ministry for Ecological Transition and the Demographic Challenge presented SIDAMUN, a platform that facilitates access to territorial statistical information based on municipal data.
The open data portal of the Government of the Canary Islands launched a new search engine that makes it possible to locate the pages of the portal using metadata, and which allows exporting in CVS, ODS or PDF.

Some organisations have taken advantage of the summer to announce new developments that will see the light of day in the coming months, such as the Xunta de Galicia, which is making progress in the development of a Public Health Observatory through an open data platform, Burgos City Council, which will launch an open data portal, and the Pontevedra Provincial Council, which will soon launch a real-time budget viewer.

Actions to promote open data

In June we met the finalists of the IV Aporta Challenge: "The value of data for the health and well-being of citizens", the final of which will be held in October. In addition, some competitions have been launched in recent months to promote the reuse of open data, for which the registration period is still open, such as the Castilla y León competition or the first UniversiData Datathon. The Euskadi open data competition was also launched and is currently in the evaluation phase.

With regard to events, the summer started with the celebration of the Open Government Week, which brought together various activities, some of them focused on data. If you missed it, some organisations have made materials available to citizens. For example, you can watch the video of the colloquium "Open data with a gender perspective: yes or yes" promoted by the Government of the Canary Islands or access the webinar presentations to learn about the Data Office and the Aporta Initiative.

Other events that have been held with the participation of the Data Office and whose videos are public are: the National Congress on Archives and Electronic Documents and the Data Spaces as ecosystems for entities to reach further.

Finally, in the field of training, some examples of courses that have been launched these months are:

The National Geographic Institute has launched an Inter-administrative Training Plan, with the aim of generating a common culture among all the experts in Geographic Information of the public bodies.
Andalucía Vuela has launched a series of free training courses aimed at citizens interested in data or artificial intelligence.

International News

The summer has also brought many new developments at the international level. Some examples are:

The beginning of the meteorological summer saw the publication of the results of the first edition of the Global Data Barometer, which measures the state of data with respect to societal issues such as Covid19 or climate.
The 12 finalists of the Eu Datathon 2022 were also announced.
An interactive edition of Eurostat's regional yearbook 2021 was published.
England has developed a strategy to harness the potential of data in health and healthcare in a secure, reliable and transparent way.

This is just a selection of news among all the developments in the open data ecosystem over the last three months. If you would like to make a contribution, feel free to leave us a message in the comments or write to dinamizacion@datos.gob.es.

15/09/2022

Castilla y León announces the sixth edition of its Open Data Competition

Evento

For yet another year since 2016, the Junta de Castilla y León has opened the deadline to receive the most innovative proposals in the field of open data. The sixth edition of the competition of the same name aims to "recognise the development of projects that provide any type of idea, study, service, website or application for mobile devices, using datasets from the Open Data Portal of the Junta de Castilla y León".

With this type of initiative, Castilla y León seeks to showcase the digital talent present in the autonomous community, while promoting the use of open data and the role of reusing companies in Castilla y León.

The deadline for submitting applications has been open since 5 August and will end on 4 October. When submitting projects, participants will be able to choose between the in-person or digital option. The latter will be carried out through the Electronic Headquarters of Castilla y León and can be processed by both individuals and legal entities.

4 different categories

As in previous editions, the projects and associated prizes are divided into four different categories:

“Ideas" category: This category includes projects that describe an idea that can be used to create studies, services, websites or applications for mobile devices. The main requirement is to use datasets from the Junta de Castilla y León's Open Data portal.
"Products and Services" category: Includes those that provide studies, services, websites or applications for mobile devices and that use datasets from the Open Data portal of the Junta de Castilla y León, which are accessible to all citizens via the web using an URL.
“Didactic Resource" category: This section includes the creation of new and innovative open didactic resources (published under Creative Commons licences) that use datasets from the Junta de Castilla y León's Open Data portal, and serve to support teaching in the classroom.
“Data Journalism" category: Finally, this category includes journalistic pieces published or updated (in a relevant way) in any medium (written or audiovisual) that use datasets from the Junta de Castilla y León's Open Data portal.

Regarding the awards of this sixth edition, the prizes amount to €12,000 and are distributed according to the category awarded and the position achieved.

Ideas category

First prize €1,500
Second prize €500

Products and Services category

First prize €2,500
Second prize €1,500
Third prize €500
Student Prize €1,500

Didactic Resource category

First prize €1,500

Data Journalism category

First prize €1,500
Second prize €1,000

As in previous editions, the final verdict will be issued by a jury made up of members with proven experience in the field of open data, information analysis or the digital economy. Likewise, the jury's decisions will be taken by majority vote and, in the event of a tie, the jury will decide who holds the presidency.

Finally, the winners will have a period of 5 working days to accept the award. If they do not accept the prize, it will be understood that they have renounced it. If you want to consult the conditions and legal bases of the competition in detail, you can access them through this link.

Winners of the 2021 edition

The 5th edition of the Castilla y León Data Competition had a total of 37 proposals of which only eight of them won some kind of award. With a view to participating in the current edition, it may be of interest to know which projects won the jury's attention in 2021.

Ideas Category

The first prize of €1,500 went to APP SOLAR-CYL, a web tool for optimal sizing of photovoltaic solar self-consumption installations. Aimed at both citizens and public administration energy managers, the solution aims to support the analysis of the technical and economic viability of this type of system.

Products and Services Category

Repuéblame is a website aimed at rediscovering the best places in which to live or telework. In this way, the app catalogues the municipalities of Castilla y León based on a series of numerical indicators, developed in-house, related to quality of life. By winning the first prize in this category, it received a cash prize of 2,500 euros.

Data Journalism Category

Asociación Maldita contra la desinformación won the first prize of €1,500 for its project MAPA COVID-19: see how many cases of coronavirus there are and how busy your hospital is.

Finally, after the jury decided that the entries submitted did not meet the criteria set out in the rules, the "Educational Resource" category was declared void and, therefore, none of the participants were awarded a prize.

If you have any questions or queries about the competition, you can write an email to: datosabiertos@jcyl.es.

30/08/2022

Analysis of the state and evolution of the national water reservoirs

Documentación

1. Introduction

Visualizations are graphical representations of data that allow the information linked to them to be communicated in a simple and effective way. The visualization possibilities are very broad, from basic representations such as line, bar or pie chart, to visualizations configured on control panels or interactive dashboards. Visualizations play a fundamental role in drawing conclusions from visual information, allowing detection of patterns, trends, anomalous data or projection of predictions, among many other functions.

Before starting to build an effective visualization, a prior data treatment must be performed, paying special attention to their collection and validation of their content, ensuring that they are in a proper and consistent format for processing and free of errors. The previous data treatment is essential to carry out any task related to data analysis and realization of effective visualizations.

In the section “Visualizations step-by-step” we are periodically presenting practical exercises on open data visualizations that are available in datos.gob.es catalogue and other similar catalogues. In there, we approach and describe in a simple way the necessary steps to obtain data, perform transformations and analysis that are relevant to creation of interactive visualizations from which we may extract information in the form of final conclusions.

In this practical exercise we have performed a simple code development which is conveniently documented, relying on free tools.

Access the Data Lab repository on Github.

Run the data pre-processing code on Google Colab.

2. Objetives

The main objective of this post is to learn how to make an interactive visualization using open data. For this practical exercise we have chosen datasets containing relevant information on national reservoirs. Based on that, we will analyse their state and time evolution within the last years.

3. Resources

3.1. Datasets

For this case study we have selected datasets published by Ministry for the Ecological Transition and Demographic Challenge, which in its hydrological bulletin collects time series data on the volume of water stored in the recent years in all the national reservoirs with capacity greater than 5hm3. Historical data on the volume of stored water are available at:

https://www.miteco.gob.es/es/agua/temas/evaluacion-de-los-recursos-hidricos/boletin-hidrologico/default.aspx

Furthermore, a geospatial dataset has been selected. During the search, two possible input data files have been found, one that contains geographical areas corresponding to the reservoirs in Spain and one that contains dams, including their geopositioning as a geographic point. Even though they are not the same thing, reservoirs and dams are related and to simplify this practical exercise, we choose to use the file containing the list of dams in Spain. Inventory of dams is available at: https://www.mapama.gob.es/ide/metadatos/index.html?srv=metadata.show&uuid=4f218701-1004-4b15-93b1-298551ae9446

https://www.miteco.gob.es/es/cartografia-y-sig/ide/descargas/egis_presa_geoetrs89_tcm30-175857.zip

This dataset contains geolocation (Latitude, Longitude) of dams throughout Spain, regardless of their ownership. A dam is defined as an artificial structure that limits entirely or partially a contour of an enclosure nestled in terrain and is destined to store water within it.

To generate geographic points of interest, a processing has been executed with the usage of QGIS tool. The steps are the following: download ZIP file, upload it to QGIS and save it as CSV, including the geometry of each element as two fields specifying its position as a geographic point (Latitude, Longitude).

Also, a filtering has been performed, in order to extract the data related to dams of reservoirs with capacity greater than 5hm3.

3.2. Tools

To perform data pre-processing, we have used Python programming language in the Google Colab cloud service, which allows the execution of JNotebooks de Jupyter.

Google Colab, also called Google Colaboratory, is a free service in the Google Research cloud which allows to program, execute and share a code written in Python or R through the browser, as it does not require installation of any tool or configuration.

Google Data Studio tool has been used for the creation of the interactive visualization.

Google Data Studio in an online tool which allows to create charts, maps or tables that can be embedded on websites or exported as files. This tool is easy to use and permits multiple customization options.

If you want to know more about tools that can help you with data treatment and visualization, see the report “Data processing and visualization tools”.

4. Enriquecimiento de los datos

In order to provide more information about each of the dams in the geospatial dataset, a process of data enrichment is carried out, as explained below.

To do this, we will focus on OpenRefine, which is a useful tool for this type of tasks. This open source tool allows to perform multiple data pre-processing actions, although at that point we will use it to conduct enrichment of our data by incorporation of context, automatically linking information that resides in a popular knowledge repository, Wikidata.

Once the tool is installed and launched on computer, a web application will open in the browser. In case this doesn´t happen, the application may be accessed by typing http://localhost:3333 in the browser´s search bar.

Steps to follow:

Step 1: Upload of CSV to the system (Figure 1).

Figura

Figure 1 – Upload of a CSV file to OpenRefine

Step 2: Creation of a project from uploaded CSV (Figure 2). OpenRefine is managed through projects (each uploaded CSV will become a project) that are saved for possible later use on a computer where OpenRefine is running. At this stage it´s required to name the project and some other data, such as the column separator, though the latter settings are usually filled in automatically.

Figure 2 – Creation of a project in OpenRefine

Step 3: Linkage (or reconciliation, according to the OpenRefine nomenclature) with external sources. OpenRefine allows to link the CSV resources with external sources, such as Wikidata. For this purpose, the following actions need to be taken (steps 3.1 to 3.3):
Step 3.1: Identification of the columns to be linked. This step is commonly based on analyst´s experience and knowledge of the data present in Wikidata. A tip: usually, it is feasible to reconcile or link the columns containing information of global or general character, such as names of countries, streets, districts, etc. and it´s not possible to link columns with geographic coordinates, numerical values or closed taxonomies (e.g. street types). In this example, we have found a NAME column containing name of each reservoir that can serve as a unique identifier for each item and may be a good candidate for linking
Step 3.2: Start of reconciliation. As indicated in figure 3, start reconciliation and select the only available source: Wikidata(en). After clicking Start Reconciling, the tool will automatically start searching for the most suitable vocabulary class on Wikidata, based on the values from the selected column.

Figure 3 – Start of the reconciliation process for the NAME column in OpenRefine

Step 3.3: Selection of the Wikidata class. In this step reconciliation values will be obtained. In this case, as the most probable value, select property “reservoir”, which description may be found at https://www.wikidata.org/wiki/Q131681 and it corresponds to the description of an “artificial lake to accumulate water”. It´s necessary to click again on Start Reconciling.

OpenRefine offers a possibility of improving the reconciliation process by adding some features that allow to target the information enrichment with higher precision. For that purpose, adjust property P4568, which description matches the identifier of a reservoir in Spain within SNCZI-IPE, as it may be seen in the figure 4.

Figure 4 – Selection of a Wikidata class that best represents the values on NAME column

Step 4: Generation of a column with reconciled or linked values. To do that, click on the NAME column and go to “Edit column → Add column based in this column”. A window will open where a name of the new column must be specified (in this case, WIKIDATA_RESERVOIR). In the expression box introduce: “http://www.wikidata.org/entity/”+cell.recon.match.id, so the values will be displayed as it´s previewed in figure 6. “http://www.wikidata.org/entity/” is a fixed text string that represents Wikidata entities, while the reconciled value of each of the values we obtain through the command cell.recon.match.id, that is, cell.recon.match.id(“ALMODOVAR”) = Q5369429.

Launching described operation will result in generation of a new column with those values. Its correctness may be confirmed by clicking on one of the new column cells, as it should redirect to a Wikidata web page containing information about reconciled value.

Repeat the process to add other type of enriched information as a reference for Google and OpenStreetMap.

Interfaz

Figure 5 – Generation of Wikidata entities through a reconciliation within a new column.

Step 5: Download of enriched CSV. Go to the function Export → Custom tabular exporter placed in the upper right part of the screen and select the features indicated in Figure 6.

Figura

Figure 6 – Options of CSV file download via OpenRefine

5. Data pre-processing

During the pre-processing it´s necessary to perform an exploratory data analysis (EDA) in order to interpret properly the input data, detect anomalies, missing data and errors that could affect the quality of subsequent processes and results, in addition to realization of the transformation tasks and preparation of the necessary variables. Data pre-processing is essential to ensure the reliability and consistency of analysis or visualizations that are created afterwards. To learn more about this process, see A Practical Introductory Guide to Exploratory Data Analysis.

The steps involved in this pre-processing phase are the following:

Installation and import of libraries
Import of source data files
Modification and adjustment of variables
Prevention and treatment of missing data (NAs)
Generation of new variables
Creation of a table for visualization “Historical evolution of water reserve between the years 2012-2022”
Creation of a table for visualization “Water reserve (hm3) between the years 2012-2022”
Creation of a table for visualization “Water reserve (%) between the years 2012-2022”
Creation of a table for visualization “Monthly evolution of water reserve (hm3) for different time series”
Saving the tables with pre-processed data

You may reproduce this analysis, as the source code is available in the GitHub repository. The way to provide the code is through a document made on Jupyter Notebook which once loaded to the development environment may be easily run or modified. Due to the informative nature of this post and its purpose to support learning of non-specialist readers, the code is not intended to be the most efficient but rather to be understandable. Therefore, it´s possible that you will think of many ways of optimising the proposed code to achieve a similar purpose. We encourage you to do it!

You may follow the steps and run the source code on this notebook in Google Colab.

6. Data visualization

Once the data pre-processing is done, we may move on to interactive visualizations. For this purpose, we have used Google Data Studio. As it´s an online tool, it´s not necessary to install software to interact or generate a visualization, but it´s required to structure adequately provided data tables.

In order to approach the process of designing the set of data visual representations, the first step is to raise the questions that we want to solve. We suggest the following: 

What is the location of reservoirs within the national territory?
Which reservoirs have the largest and the smallest volume of water (water reserve in hm3) stored in the whole country?
Which reservoirs have the highest and the lowest filling percentage (water reserve in %)?
What is the trend of the water reserve evolution within the last years?

Let´s find the answers by looking at the data!

6.1. Geographic location and main information on each reservoir

This visual representation has been created with consideration of geographic location of reservoirs and distinct information associated with each one of them. For this task, a table “geo.csv” has been generated during the data pre-processing.

Location of reservoirs in the national territory is shown on a map of geographic points.

Once the map is obtained, you may access additional information about each reservoir by clicking on it. The information will display in the table below. Furthermore, an option of filtering by hydrographic demarcation and by reservoir is available through the drop-down tabs.

View the visualization in full screen

6.2. Water reserve between the years 2012-2022

This visual representation has been made with consideration of water reserve (hm3) per reservoir between the years 2012 (inclusive) and 2022. For this purpose, a table “volumen.csv” has been created during the data pre-processing.

A rectangular hierarchy chart displays intuitively the importance of each reservoir in terms of volumn stored within the national total for the time period indicated above.

Ones the chart is obtained, an option of filtering by hydrographic demarcation and by reservoir is available through the drop-down tabs.

View the visualization in full screen

6.3. Water reserve (%) between the years 2012-2022

This visual representation has been made with consideration of water reserve (%) per reservoir between the years 2012 (inclusive) and 2022. For this task, a table “porcentaje.csv” has been generated during the data pre-processing.

The percentage of each reservoir filling for the time period indicated above is intuitively displayed in a bar chart.

Ones the chart is obtained, an option of filtering by hydrographic demarcation and by reservoir is available through the drop-down tabs.

View the visualization in ful screen

6.4. Historical evolution of water reserve between the years 2012-2022

This visual representation has been made with consideration of water reserve historical data (hm3 and %) per reservoir between the years 2012 (inclusive) and 2022. For this purpose, a table “lineas.csv” has been created during the data pre-processing.

Line charts and their trend lines show the time evolution of the water reserve (hm3 and %).

Ones the chart is obtained, modification of time series, as well as filtering by hydrographic demarcation and by reservoir is possible through the drop-down tabs.

View the visualization in full screen

6.5. Monthly evolution of water reserve (hm3) for different time series

This visual representation has been made with consideration of water reserve (hm3) from distinct reservoirs broken down by months for different time series (each year from 2012 to 2022). For this purpose, a table “lineas_mensual.csv” has been created during the data pre-processing.

Line chart shows the water reserve month by month for each time series.

Ones the chart is obtained, filtering by hydrographic demarcation and by reservoir is possible through the drop-down tabs. Additionally, there is an option to choose time series (each year from 2012 to 2022) that we want to visualize through the icon appearing in the top right part of the chart.

View the visualization in full screen

7. Conclusions

Data visualization is one of the most powerful mechanisms for exploiting and analysing the implicit meaning of data, independently from the data type and the user´s level of the technological knowledge. Visualizations permit to create meaningful data and narratives based on a graphical representation. In the set of implemented graphical representations the following may be observed:

A significant trend in decreasing the volume of water stored in the reservoirs throughout the country between the years 2012-2022.
2017 is the year with the lowest percentage values of the total reservoirs filling, reaching less than 45% at certain times of the year.
2013 is the year with the highest percentage values of the total reservoirs filling, reaching more than 80% at certain times of the year.

It should be noted that visualizations have an option of filtering by hydrographic demarcation and by reservoir. We encourage you to do it in order to draw more specific conclusions from hydrographic demarcation and reservoirs of your interest.

Hopefully, this step-by-step visualization has been useful for the learning of some common techniques of open data processing and presentation. We will be back to present you new reuses. See you soon!

27/07/2022

The 10 finalists of the 4th Aporta Challenge have been published

Noticia

Last November, Red.es, in collaboration with the Secretary of State for Digitalisation and Artificial Intelligence launched the 4th edition of the Aporta Challenge. Under the slogan "The value of data for health and well-being of citizens", the competition seeks to identify new services and solutions, based on open data, that drive improvements in this field.

The challenge is divided into two phases: an ideas competition, followed by a second phase where finalists have to develop and present a prototype. We are now at the halfway point of the competition. Phase I has come to an end and it is time to find out who are the 10 finalists who will move on to phase II.

After analysing the diverse and high-quality proposals submitted, the jury has determined a series of finalists, as reflected the resolution published on the Red.es website.

Let us look at each candidacy in detail:

Getting closer to the patient

Team:

SialSIG aporta, composed of Laura García and María del Mar Gimeno.

What does it consist of?

A platform will be built to reduce rescue time and optimise medical care in the event of an emergency. Parameters will be analysed to categorise areas by defining the risk of mortality and identifying the best places for aerial rescue vehicles to land. This information will also make available which areas are the most isolated and vulnerable to medical emergencies, information of great value for defining strategies for action that will lead to an improvement in the management and resources to be used.

Data

The platform seeks to integrate information from all the autonomous communities, including population data (census, age, sex, etc.), hospital and heliport data, land use and crop data, etc. Specifically, data will be obtained from the municipal census of the National Statistics Institute (INE), the boundaries of provinces and municipalities, the land use classification of the National Geographic Institute (IGN) and data from the SIGPAC (MAPA), among others.

Hospital pressure monitoring

Team:

DSLAB, data science research group at Rey Juan Carlos University, composed of Isaac Martín, Alberto Fernández, Marina Cuesta and María del Carmen Lancho.

What does it consist of?

With the aim of improving hospital management, the DSLAB proposes an interactive and user-friendly dashboard that allows:

Monitor hospital pressure
Evaluate the actual load and saturation of healthcare centres
Forecast the evolution of this pressure

This will enable better resource planning, anticipate decision making and avoid possible collapses.

Data

To realise the tool's potential, the prototype will be created with open data relating to COVID in the Autonomous Community of Castilla y León, such as bed occupancy or the epidemiological situation by hospital and province. However, the solution is scalable and can be extrapolated to any other territory with similar data.

RIAN - Intelligent Activity and Nutrition Recommender

Team:

RIAN Open Data Team, composed of Jesús Noguera y Raúl Micharet.

What does it consist of?

RIAN was created to promote healthy habits and combat overweight, obesity, sedentary lifestyles and poor nutrition among children and adolescents. It is an application for mobile devices that uses gamification techniques, as well as augmented reality and artificial intelligence algorithms to make recommendations. Users have to solve personalised challenges, individually or collectively, linked to nutritional aspects and physical activities, such as gymkhanas or games in public green spaces.

Data

The pilot uses data relating to green areas, points of interest, greenways, activities and events from the cities of Málaga, Madrid, Zaragoza and Barcelona. These data are combined with nutritional recommendations (food data and nutritional values and branded food products) and data for food image recognition from Tensorflow or Kaggle, among others.

MentalReview - visualising data for mental health

Team:

Kairos Digital Analytics and Big Data Solutions S.L.

What does it consist of?

MentalReview is a mental health monitoring tool to support health and social care management and planning, enabling institutions to improve citizen care services. The tool will allow the analysis of information extracted from open databases, the calculation of indicators and, finally, the visualisation of the information through graphs and an interactive map. This will allow us to know the current state of mental health in the Spanish population, identify trends or make a study of its evolution.

Data

For its development, data from the INE, the Sociological Research Centre, the Mental Health Services of the different autonomous regions, the Spanish Agency for Medicines and Health Products or EUROSTAT, among others, will be used. Some specific examples of datasets to be used are: anxiety problems in young people, the suicide mortality rate by autonomous community, age, sex and period or the consumption of anxiolytics.

HelpVoice!

Team:

Data Express, composed of Sandra García, Antonio Ríos and Alberto Berenguer.

What does it consist of?

HelpVoice! is a service that helps our elderly through voice recognition techniques based on automatic learning. In an emergency situation, the user only need to click on a device that can be an emergency button, a mobile phone or home automation tools and tell about their symptoms. The system will send a report with the transcript and predictions to the nearest hospital, speeding up the response of the healthcare workers. In parallel, HelpVoice! will also recommend to the patient what to do while waiting for the emergency services.

Data

Among other open data, the map of hospitals in Spain will be used. Speech and sentiment recognition data will also be used in the text.

Living and liveable cities: creating high-resolution shadow maps to help cities adapt to climate change

Team:

Living Cities, composed of Francisco Rodríguez-Sánchez and Jesús Sánchez-Dávila.

What does it consist of?

In the current context of rising temperatures, the Living Cities team proposes to develop open software to promote the adaptation of cities to climate change, facilitating the planning of urban shading. Using spatial analysis, remote sensing and modelling techniques, this software will allow to know the level of insolation (or shading) with high spatio-temporal resolution (every hour of the day at every square metre of land) for any municipality in Spain. The team will particularly analyse the shading situation in the city of Seville, offering its results publicly through a web application that will allow consultation of the insolation maps and to obtain shade routes between different points in the city.

Data

Living Cities is based on the use of open remote sensing data (LiDAR) from the National Aerial Orthophotography Programme (PNOA), the Seville city trees and spatial data from OpenStreetMap.

Impact of air quality on respiratory health in the city of Madrid

Team:

So Good Data, composed of Ana Belén Laguna, Manuel López, Vicente Lorenzo, Javier Maestre and Iván Robles.

What does it consist of?

So Good Data is proposing a study to analyse the impact of air pollution on the number of hospital admissions for respiratory diseases. It will also determine which pollutant particles are likely to be most harmful. With this information, it would be possible to predict the number of admissions a hospital will face depending on air pollution on a given date, in order to take the necessary measures in advance and reduce mortality.

Data

Among other datasets, hospitalisations due to respiratory diseases, air quality, tobacco sales or atmospheric pollen in the Community of Madrid will be used for the study.

PLES

Team:

BOLT, composed of Víctor José Montiel, Núria Foguet, Borja Macías, Alejandro Pelegero and José Luis Álvarez.

What does it consist of?

The BOLT team will create a web application that allows the user to obtain an estimate of the average waiting time for consultations, tests or interventions in the public health system of Catalonia. The time series prediction models will be developed using Python with statistical and machine learning techniques. The user only need to indicate the hospital and the type of consultation, operation or test for which he/she is waiting. In addition to improving transparency with patients, the website can also be used by healthcare professionals to better manage their resources.

Datos

The Project will use data from the public waiting lists in Catalonia published by CatSalut on a monthly basis. Specifically, monthly data on waiting lists for surgery, specialised outpatient consultations and diagnostic tests will be used from at least 2019 to the present. In the future, the idea could be adapted to other Autonomous Communities.

The Hospital Morbidity Survey: Proposal for the development of a MERN+Python web environment for its analysis and graphical visualisation.

Team:

Marc Coca Moreno

What does it consist of?

This is a web environment based on MERN, Python and Pentaho tools for the analysis and interactive visualisation of the Hospital Morbidity Survey microdata. The entire project will be developed with open source and free tools. Both the code and the final product will be openly accessible.

Specifically, it offers 3 major analyses with the aim of improving health planning:

o Descriptive: hospital discharge counts and time series.

o KPIs: standardised rates and indicators for comparison and benchmarking of provinces and communities.

o Flows: count and analysis of discharges from a hospital region and patient origin.

All data will be filterable according to dataset variables (age, sex, diagnoses, circumstance of admission and discharge, etc.).

Data

In addition to the microdata from the INE's Hospital Morbidity Survey, it will also integrate Statistics from the Continuous Register (also from the INE), data from the Ministry of Health's catalogues of ICD10 diagnoses and from the catalogues and indicators of the Agency for Healthcare Research and Quality (AHRQ) and of the Autonomous Communities, such as Catalonia: catalogues and stratification tools.

TWINPLAN: Decision support system for accessible and healthy routes

Team:

TWINPLAN, composed of Ivan Araquistain, Josu Ansola and Iñaki Prieto

What does it consist of?

This is a web App to facilitate accessibility for people with mobility problems and promote healthy exercise for all citizens. The tool assesses whether your route is affected by any incidents in public lifts and, if so, proposes an alternative accessible route, also indicating the level of traffic (noise) in the area, air quality and cardioprotection points. It also provides contact details for nearby means of transport.

This web App can also be used by public administrations to monitor the use and planning of new accessible infrastructures.

Data

The prototype will be developed using data from the Digital Twin of Ermua's public lifts, although the model is scalable to other territories. This data is complemented with other public data from Ermua such as the network of environmental sensors, traffic and LurData, among other sources.

These 10 proposals now have several months to develop their proposals, which will be presented on 18 October. The three prototypes best valued by the jury will receive €5,000, €4,000 and €3,000, respectively.

Good luck to all the finalists!

Infographic with information about the finalists of the Challenge. Version available below.

(You can download the accessible version in word here)

01/07/2022

Augmented reality for data visualisation

Blog

Digital life has arrived to become part of our daily lives and with it new communication and information consumption habits. Concepts such as augmented reality are actively participating in this process of change in which an increasing number of companies and organisations are involved.

Differences between augmented and virtual reality

Although the nomenclature of these concepts is somewhat similar, in practice, we are talking about different scenarios:

Virtual reality: This is a digital experience that allows the user to immerse themselves in an artificial world where they can experience sensory nuances isolated from what is happening outside.
Augmented reality: This is a data visualisation alternative that enhances the user experience by incorporating digital elements into tangible reality. In other words, it allows visual aspects to be added to the environment around us. This makes it especially interesting in the world of data visualisation, as it allows graphic elements to be superimposed on our reality. To achieve this, it is most common to use specialised glasses. At the same time, augmented reality can also be developed without the need for external gadgets. Using the camera of our mobile phone, some applications are capable of combining the visualisation of real elements present around us with other digitally processed elements that allow us to interact with tangible reality.

In this article we are going to focus on augmented reality, which is presented as an effective formula for sharing, presenting and disseminating the information contained in datasets.

Challenges and opportunities

The use of augmented reality tools is particularly useful when distributing and disseminating knowledge online. In this way, instead of sharing a set of data through text and graphic representations, augmented reality allows us to explore ways of disseminating information that facilitate understanding from the point of view of the user experience.

These are some of the opportunities associated with its use:

Through 3D visualisations, augmented reality allows the user to have an immersive experience that facilitates contact with and internalisation of this type of information.
It allows information to be consulted in real time and to interact with the environment. Augmented reality allows the user to interact with data in remote locations. Data can be adapted, even in spatial terms, to the needs and characteristics of the environment. This is particularly useful in field work, allowing operators repairing breakdowns or farmers in the middle of their crops to access the up-to-date information they need, in a highly visual way, combined with the environment.
A higher density of data can be displayed at the same time, which facilitates the cognitive processing of information. In this way, augmented reality helps to speed up comprehension processes and thus our ability to conceive new realities.

Example of visualisation of agricultural data on the environment.

Example of visualisation of agricultural data on the environment

Despite these advantages, the market is still developing and faces challenges such as implementation costs, lack of common standards and user concerns about security.

Use cases

Beyond the challenges, opportunities and strengths, augmented reality becomes even more relevant when organisations incorporate it into their innovation area to improve user experience or process efficiency. Thus, through the use cases, we can better understand the universe of usefulness that lies behind augmented reality.

One field where they represent a great opportunity is tourism. One example is Gijón in a click, a free mobile application that makes 3 routes available to visitors. During the tours, plaques have been installed on the ground from where tourists can launch augmented reality recreations with their own mobile phone.

From the point of view of hardware companies, we can highlight the example, among a long list, of the smart helmet prototype designed by the company Aegis Rider, which allows information to be obtained without taking your eyes off the road. This helmet uses augmented reality to project a series of indicators at eye level to help minimise the risk of an accident.

The projected data includes information from open data sources such as road conditions, road layout and maximum speed. In addition, using a system based on object recognition and deep learning, the Aegis Rider helmet also detects objects, animals, pedestrians or other vehicles on the road that could pose an accident risk when they are in the same path.

In addition to road safety, but continuing with the possibilities offered by augmented reality, Accuvein uses augmented reality to prevent chronic patients, such as cancer patients, from having to suffer failed needlesticks when receiving their medication. To do this, Accuvein has designed a handheld scanner that projects the exact location of the various veins on the patient's skin. According to its developers, the level of accuracy is 3.5 times higher than that of a traditional needle stick.

On the other hand, ordinary citizens can also find augmented reality as supporting material for news and reports. Media such as The New York Times are increasingly offering information that uses augmented reality to visualise datasets and make them easier to understand.

Tools for generating visualisations with augmented reality

As we have seen, augmented reality can also be used to create data visualisations that facilitate the understanding of sets of information that, a priori, may seem abstract. To create these visualisations there are different tools, such as Flow, whose function is to facilitate the work of programmers and developers. This platform, which displays datasets through the API of the WebXR device, allows these types of professionals to load data, create visualisations and add steps for the transition between them. Other tools include 3Data or Virtualitics. Companies such as IBM are also entering the sector.

For all these reasons, and in line with the evidence provided by the previous use cases, augmented reality is positioned as one of the data visualisation and transmission technologies that have arrived to further expand the limits of the knowledge society in which we are immersed.

Content prepared by the datos.gob.es team.

29/06/2022

What's new in the open data ecosystem (Spring 22)

Noticia

Spring, in addition to the arrival of good weather, has brought a great deal of news related to the open data ecosystem and data sharing. Over the last three months, European-driven developments have continued, with two initiatives that are currently in the public consultation phase standing out:

Progress on data spaces. The first regulatory initiative for a data space has been launched. This first proposal focuses on health sector data and seeks to establish a uniform legal framework, facilitate patients' electronic access to their own data and encourage its re-use for other secondary purposes. It is currently under public consultation until 28 July.
Boosting high-value data. The concept of high-value data, set out in Directive 2019/1024, refers to data whose re-use brings considerable benefits to society, the environment and the economy. Although this directive included a proposal for initial thematic categories to be considered, an initiative is currently underway to create a more concrete list. This list has already been made public and any citizen can add comment until 24 June. In the addition, the European Commission has also launched a series of grants to support public administrations at local, regional and national level to boost the availability, quality and usability of high-value data.

These two initiatives are helping to boost an ecosystem that is growing steadily in Spain, as shown in these examples of news we have collected over the last few months.

Examples of open data re-use

This season we have also learned about different projects that highlight the advantages of using data:

Thanks to the use of Linked Open Data, a professor from the University of Valladolid has created the web application LOD4Culture. This app allows to explore the world's cultural heritage from the semantic databases of dbpedia and wikidata.
The University of Zaragoza has launched sensoriZAR to monitor air quality and reduce energy consumption in its spaces. Built on IoT, open data and open science, the solution is focused on data-driven decision-making.
Villalba Hospital has created a map of cardiovascular risk in the Sierra de Madrid thanks to Big Data. The map collects clinical and demographic data of patients to inform about the likelihood of developing such a disease in the future.
The Open Government Lab of Aragon has recently presented "RADAR", an application that provides geo-referenced information on initiatives in rural areas.

Agreements to boost open data

The commitment of public bodies to open data is evident in a number of agreements and plans that have been launched in recent months:

On 13 April, the mayors of Madrid and Malaga signed two collaboration agreements to jointly boost digital transformation and tourism growth in both cities. Thanks to these agreements, it will be possible to adopt policies on security and data protection, open data and Big Data, among others.
The Government of the Balearic Islands and Asedie will collaborate to implement open data measures, transparency and reuse of public data. This is set out in an agreement that seeks to promote public-private collaboration and the development of commercial solutions, among others.
The Generalitat Valenciana has signed an agreement with the Universitat Politècnica de València through which it will allocate €70,000 to carry out activities focused on the openness and reuse of data during the 2022 financial year.
Madrid City Council has also initiated the process for the elaboration of the III Open Government Action Plan for the city, for which it launched a public consultation.

In addition, open data platforms continue to be enriched with new datasets and tools aimed at facilitating access to and use of information. Some examples are:

Aragón Open Data has presented its virtual assistant to bring the portal's data closer to users in a simple and user-friendly way. The solution has been developed by Itainnova together with the Government of Aragon.
Cartociudad, which belongs to the Ministry of Transport, Mobility and Urban Agenda, has a new viewer to locate postal addresses. It has been developed from a search engine based on REST services and has been created with API-CNIG.
Madrid City Council has also launched a new open data viewer. Interactive dashboards with data on energy, weather, parking, libraries, etc. can be consulted.
The Department of Ecological Transition of the Government of the Canary Islands has launched a new search engine for energy certificates by cadastral reference, with information from the Canary Islands Open Data portal.
The Segovia City Council has renewed its website to host all the pages of the municipal areas, including the Open Data Portal, under the umbrella of the Smart Digital Segovia project.
The University of Navarra has published a new dataset through GBIF, showing 10 years of observations on the natural succession of vascular plants in abandoned crops.
The Castellón Provincial Council has published on its open data portal a dataset listing the 52 municipalities in which the installation of ATMs to combat depopulation has been promoted.

Boom in events, trainings and reports

Spring is one of the most prolific times for events and presentations. Some of the activities that have taken place during these months are:

Asedie presented the 10th edition of its Report on the state of the infomediary sector, this time focusing on the data economy. The report highlights that this sector has a turnover of more than 2,000 million euros per year and employs almost 23,000 professionals. On the Asedie website you can access the video of the presentation of the report, with the participation of the Data Office.
During this season, the results of the Gobal Data Barometer were also presented. This report reflects examples of the use and impact of open data, but also highlights the many barriers that prevent access and effective use of open data, limiting innovation and problem solving.
The Social Security Data Conference was held on 26 May. It was recorded on video and can be viewed at this link. They showed the main strategic lines of the Social Security IT Management (GISS) in this area.
The recording of the conference "Public Strategies for the Development of Data Spaces", organised by AIR4S (Digital Innovation Hub in AI and Robotics for the Sustainable Development Goals) and the Madrid City Council, is also available. During the event, public policies and initiatives based on the use and exploitation of data were presented.
Another video available is the presentation of Oscar Corcho, Professor at the Polytechnic University of Madrid (UPM), co-director of the Ontological Engineering Group (OEG) and co-founder of LocaliData, talking about the collaborative project Ciudades Abiertas in the webinar "Improving knowledge transfer across organisations by knowledge graphs". You can watch it at this link from minute 15:55 onwards.
In terms of training, the FEMP's Network of Local Entities for Transparency and Participation approved its Training Plan for this year. It includes topics related to the evaluation of public policies, open data, transparency, public innovation and cybersecurity, among others.
Alcobendas City Council has launched a podcast section on its website, with the aim of disseminating among citizens issues such as the use of open data in public administrations.

Other news of interest in Europe

We end our review with some of the latest developments in the European data ecosystem:

The 24 shortlisted teams for the EUDatathon 2022 have been made public. Among them is the Spanish Astur Data Team.
The European Consortium for Digital and Interconnected Scientific Infrastructures LifeWatch ERIC, based in Seville, has taken on the responsibility of providing technological support to the global open data network on biodiversity.
The European Commission has recently launched Kohesio. It is a new public online platform that provides data on European cohesion projects. Through the data, it shows the contribution of the policy to the economic, territorial and social cohesion of EU regions, as well as to the ecological and digital transitions.
The European Open Data Portal has published a new study on how to make its data more reusable. This is the first in a series of reports focusing on the sustainability of open data portal infrastructures.

This is just a selection of news among all the developments in the open data ecosystem over the last three months. If you would like to make a contribution, you can leave us a message in the comments or write to dinamizacion@datos.gob.es.

16/06/2022

casos de uso

Prediction

Information

Material damage

Human rights violations and abuses

Migratory movements

Humanitarian response

How does the competition unfold?

Who can participate?

What do the prizes consist of?

How can I participate?

Find out more about Open Data Bizkaia

The challenge of creating the language assistants of the future

More than 135 billion words at the service of artificial intelligence

The role of BNE datasets

Next steps

The information and monitoring system for transport research and innovation

The 100 mobility questions initiative

Advances in strategy and agreements to promote open data

Examples of data reuse

Developments in open data platforms

Actions to promote open data

International News

4 different categories

Ideas category

Products and Services category

Didactic Resource category

Data Journalism category

Winners of the 2021 edition

Ideas Category

Products and Services Category

Data Journalism Category

1. Introduction

2. Objetives

3. Resources

3.1. Datasets

3.2. Tools

4. Enriquecimiento de los datos

5. Data pre-processing

6. Data visualization

6.1. Geographic location and main information on each reservoir

6.2. Water reserve between the years 2012-2022

6.3. Water reserve (%) between the years 2012-2022

6.4. Historical evolution of water reserve between the years 2012-2022

6.5. Monthly evolution of water reserve (hm3) for different time series

7. Conclusions

Getting closer to the patient

Hospital pressure monitoring

RIAN - Intelligent Activity and Nutrition Recommender

MentalReview - visualising data for mental health

HelpVoice!

Living and liveable cities: creating high-resolution shadow maps to help cities adapt to climate change

Impact of air quality on respiratory health in the city of Madrid

PLES

The Hospital Morbidity Survey: Proposal for the development of a MERN+Python web environment for its analysis and graphical visualisation.

TWINPLAN: Decision support system for accessible and healthy routes

Differences between augmented and virtual reality

Challenges and opportunities

Use cases

Tools for generating visualisations with augmented reality

Examples of open data re-use

Agreements to boost open data

Boom in events, trainings and reports

Other news of interest in Europe