Entrevista

R Hispano is a community of users and developers that was born in 2011, within the III Conference of R Users, with the aim of promoting the advancement of knowledge and the use of the programming language in R. From datos.gob.es we have spoken with them so that they can tell us more about the activities they carry out and the role of open data in them.

Full interview.

1. Can you briefly explain what the R-Hispano Community is?

It is an association created in Spain whose objective is to promote the use of R among a Hispanic audience. There are many R users worldwide and we try to serve as a meeting point for all those whose primary language is Spanish. By having a smaller group within such a large community as a reference, it is easier to build relationships and meet people to turn to when you want to learn more or share what you have learned.

2. R was born as a language linked to the statistical exploitation of data, however, it has become an essential tool of Data Science, why so much acceptance of this language by the community?

It's true that many data science and engineering professionals tend to use more generic languages ​​like Python. However, there are several reasons why R is essential in the "Stack" of teams that work with data. First of all, R has its origin in the S language, which was designed in the 1970s specifically for data analysis, within the Bell Laboratories. This allows people with different computer backgrounds to participate in complex projects, focusing on analysis methods. Second, R has aged very well, and a broad community of users, developers, and businesses contribute to the project with packages and tools that quickly extend functionality to the most innovative methods with (relative) simplicity and rigor.

3. R Hispano works through numerous local initiatives, what advantages does this form of organization bring?  

In day-to-day activities, especially when we had face-to-face meetings, more than a year ago, it is more comfortable to coordinate people as closely as possible. It makes no sense for a person in Madrid to organize monthly meetings in Malaga, Seville or the Canary Islands. The interesting thing about these events is to attend regularly, get to know the attendees, understand what the public demands and what can be offered. That, apart from pampering and dedication, requires being close because, otherwise, there is no way to establish that bond. That is why it seemed to us that it is from the cities themselves that this relationship has to be maintained from day to day. On the other hand, it is the way in which the Community of R has been organized around the world, with the success that we all know.

4. Do you consider open data initiatives a valuable source of information for the development of your projects? Any notable reuse examples? What aspects of the current initiatives do you consider could be improved?

The first thing to say is that R Hispano as such has no projects. However, many R Hispano partners work with open data in their professional field, be it academic or business. Of course, it is a very valuable source of information, with many examples, such as the analysis of data from the pandemic that we still suffer, data from sports competitions and athletes' performance, environmental, socio-economic data, ... We cannot highlight any of them because there are many very interesting that would deserve it equally. As for the improvements, there are still many public data repositories that do not publish it in a “treatable” format by analysts. A PDF report can be open data, but it certainly does not contribute to its dissemination, analysis, and exploitation for the good of society.

5. Can you tell us about some of the activities carried out by these local Initiatives?

Several local R groups, both in Spain and Latam, recently collaborated with the technology training company, UTad, in the event “Encounters in the R phase”. Held online for two days. The R user days that we celebrate each year are usually organized by one of the local groups at the headquarters. The Córdoba group is organizing the next ones, postponed due to the pandemic and for which we hope to announce dates soon.

The Madrid R User Group began to function as a local group linked to the Hispanic R Community more than fifteen years ago. Since its origin, it has maintained a monthly periodicity of meetings announced on the social network Meetup (sponsored by RConsortium, entity, founded and subsidized by large companies to favor the use of R). The activity has been interrupted by the limitations of the Covid-19, but all the history of the presentations has been compiled in this portal.

From the Group of R Canarias they have been involved in the conference TabularConf, which took place on January 30, online, with an agenda of a dozen presentations on data science and artificial intelligence. In the past the Canarian group carried out a R user meeting with communications on various topics, including modeling, geographic data processing, as well as queries to public data APIs, such as datos.gob.es, with the library opendataes. Other bookstores presented at a meetup they held in 2020 areistacr or inebaseR, always betting on access to public data.

In the Local Group of Seville, during the hackathons held in recent years they have begun to develop several packages totally linked to open data

  • Air: To get air quality data in Andalusia (works, but needs some adjustments)
  • Aemet: R package to interact with the AEMET API (climatic data). We took the first steps in a hackathon, then Manuel Pizarro made a fully functional package.
  • Andaclima: Package to obtain climatic data from agroclimatic stations of the Junta de Andalucía
  • Data.gob.es.r: Package embryo to interact with http://datos.gob.es. Really just an exploration of ideas, nothing functional for now.

Regarding COVID-19, it is worth highlighting the development by the UCLM, with the collaboration of a former member of the Board of Directors of the R Hispano Community, of this COVID-19 analysis panel, with the cases that the Board of Communities of Castilla-La Mancha presented by municipality. It consists of a interactive tool to consult the information on the incidence and rates per 100,000 inhabitants.

6. In addition, they also collaborate with other groups and initiatives.

Yes, we collaborate with other groups and initiatives focused on data, such as the UNED (Faculty of Sciences), which for a long period of time welcomed us as its permanent headquarters. I would also highlight our performances with: 

  • Data Journalism Group. Joint filings with the Data Journalism group, sharing the benefits of R for their analysis.
  • A collaboration with the Group Machine Learning Spain that resulted in a common presentation in the Google Campus of Madrid.
  • With groups of other data languages, such as Python.
  • Collaborations with companies. At this point we highlight having participated in two Advanced Analytics events organized by Microsoft, as well as having received small financial aid from companies such as Kabel or Kernel Analytics (recently acquired by Boston Consulting Group).

In addition, different partners of R-Hispano also collaborate with academic institutions, in which they teach different courses related to Data analysis, especially promoting the use and analysis of open data, such as the Faculty of Economics of the UNED, the Faculties of Statistics and Tourism and Commerce of the UCM, the University of Castilla-La Mancha, the EOI (specific subject on open data), the Francisco de Vitoria University, the Higher School of Telecommunications Engineering, the ESIC and the K- School.

Finally, we would like to highlight the constant link that is maintained with different relevant entities of the R ecosystem: with R-Consortium (https://www.r-consortium.org/) and RStudio (https://rstudio.com/). It is through the R-Consortium where we have obtained the recognition of the Madrid Group as a stable group and from which we obtain the sponsorship for the payment of Meetup. Within RStudio we maintain different contacts that have also allowed us to obtain sponsorships that have helped in the R Conference, as well as speakers of the stature of Javier Luraschi (author of the package and book on “sparklyr”) or Max Kuhn (author of packages such as "Caret" and its evolution "tidymodels").

7. Through ROpenSpain, some RHispano partners have collaborated in the creation of packages in R that facilitate the use of open data. 

ROpenSpain is a community of R, open data and reproducibility enthusiasts who come together and organize to create R packages of the highest quality for the exploitation of Spanish data of general interest. It was born, with the inspiration of ROpenSci, in February 2018 as an organization ofGitHub and has a collaboration channel in Slack. As of January 2021, ROpenSpain groups the following R packages:

  • opendataes: Easily interact with the data.gob.es API, which provides data from public administrations throughout Spain.
  • MicroData: Allows importing to R various types of INE microdata files: EPA, Census, etc.
  • caRtocity: Consult the Cartociudad API, which provides geolocation services, routes, maps, etc.
  • Siane: To represent statistical information on the maps of the National Geographic Institute.
  • airquality: Air quality data in Spain from 2011 to 2018.
  • mapSpain: To load maps of municipalities, provinces and Autonomous Communities. Includes a plugin for leaflet.
  • MorbiditySpainR: Read and manipulate data from the Hospital Morbidity Survey
  • spanish: For the processing of certain types of Spanish information: numbers, cadastral geocoding, etc.
  • BOE: For the processing of the Official State Gazette and the Official Gazette of the Mercantile Registry.
  • istacbaser: To consult the API of the Canary Institute of Statistics.
  • Cadastre: Consult the Land Registry API.

Some of these packages have been featured at events organized by the R Hispano Community.

8. Finally, how can interested people follow R-Hispano and collaborate with you?

An important element as a link in the entire community of R users in Spanish is the R-Help-es help list:

It is one of the few active R-Help lists independent of the main English R-Help that has generated more than 12,800 entries in its more than 12-year history.

In addition, a high level of activity is maintained in social networks that serve as a speaker, a lever through which future events or different news related to data of interest to the community are announced.

We can highlight the following initiatives in each of the platforms:

  • Twitter: Presence of the R-Hispano association itself; https://twitter.com/R_Hisp and participation in the hashtag #rstatsES (R in Spanish) of different R collaborators at the national level.
  • LinkedIn: In this professional network, "R" has a presence through the company page https://www.linkedin.com/company/comunidad-r-hispano/. In addition, many R-Hispano partners from both Spain and Latam are part of this network, sharing open resources.
  • Telegram channel: (https://t.me/rhispano) There is a telegram channel where news of interest to the community is disseminated with certain periodicity

Finally, on the association's website, http://r-es.org, you can find information about the association, as well as how to become a member (the fee is, like R, free)

calendar icon
Noticia

The European Data Portal (EDP) has presented its report "Copernicus data for the open data community", prepared by con.terra as part of the EDP consortium. As we have reported before, Copernicus is the European Union's Earth Observation program that provides accurate, timely and easily accessible information to improve environmental management, understand and mitigate the effects of climate change and ensure civil security.

The report aims to help users harness the potential of Copernicus data to create Earth observation applications by answering three basic questions:

  • What can I do with Copernicus data?
  • How can I access the data?
  • What tools do I need to use the data?

After an introduction reviewing the main activities and services available from the program, the report is divided into two parts: a first part where examples of Copernicus data applications are examined and a second, more practical part, where a particular use case is replicated in depth.

Copernicus use cases

The first part covers a series of possible use cases at a general level to answer the first of the questions posed above: what can be done with Copernicus data?

The use cases discussed are linked to the thematic areas addressed by the Copernicus program (emergency, security, marine, land, climate change and atmospheric monitoring), as well as to its services and tools. These examples cover the observation of plastic pollution of the oceans, land change due to mining activities, the impact of volcanic activities, ice loss, the creation of artificial islands, deforestation, forest fires, storms or pests.

Screenshot of the Copernicus website showing the services it provides.

The report highlights the importance of knowing what data are appropriate for each specific use case. For example, SENTINEL 2 MSI data is suitable for land monitoring, emergency management and security services, while SENTINEL 3 Altimetry data is linked to the areas of marine monitoring and climate change. To assist in this identification task, the guide includes references to various user guides with specifications on the missions, the instruments used to collect the data and the data products generated.

Case study on the use of Copernicus data 

The second part of the report focuses on a particular use case that it addresses in depth, including how to download the appropriate data, process it and build applications with it. Specifically, it addresses the mapping of the lava flow of the Etna volcano using data from the Copernicus emergency management service. The aim is to track the impact of volcanic activities on nature and urban areas.

First, the report shows how to search and download data for this area of interest.  In this case, Sentinel-2 products are used from the Copernicus Open Access Hub. The entry point for accessing the Copernicus data is their own website, which provides an overview of the data access points. Through different images, the report shows search and filter options to locate the appropriate data.

Screenshot showing sentinel-2 scene search and download capabilities from Copernicus Open Access Hub (url:scihub.copernicus.eu/dhus/)

To visualize and process the data, it is proposed to use commercial software such as ArcGIS Pro,  free GIS tools such as QGIS, open source processing tools such as SNAP or programming libraries such as GDAL. In the case of the example, SNAP (Sentinel Application Platform), the European Space Agency (ESA) platform, is used to view the lava flow.

Some explanations on workflow automation with the Open Access Hub API and the SNAPgraph tool are given at the end of the chapter.

Final conclusions

The report ends with several conclusions, among which the following stand out:

  1. Users can extract great value from Copernicus data but to do so they need to be familiar with the platforms involved and the necessary tools.
  2. For most use cases, it is necessary to combine Copernicus data with in situ data. The Copernicus program itself uses data from ground-based sensors, for example, for calibration and validation of its products.

Spain's role in Copernicus

The Ministry of Transport, Mobility and Urban Agenda, through the National Geographic Institute, and the Ministry for Ecological Transition (MITECO) represent Spain in the Copernicus User Forum, for the monitoring and evolution of the program. In this interview Nuria Valcárcel, Deputy Assistant Director (Observation of the Territory) of the General Directorate of Geodesy and Cartography, in the D.G. National Geographic Institute (IGN) delves into the services of Copernicus and its usefulness in the economic and social field.

In datos.gob.es you can also find this other interview with Stéphane Ourevitch, founder of SpaceTec, who participated as a speaker at the Encuentro Aporta 2019, where he tells us about the usefulness of data for space observation and how the Copernicus program promotes entrepreneurship through actions such as hackathons.

Copernicus data are very useful all over the world. In our country, we also find multiple services and applications developed based on Copernicus data, some of which are collected in this article.

calendar icon
Noticia

Open Data Day was the date chosen for the launch of the EU Datathon 2021, an event that is now in its fifth edition. Organized by the Publications Office of the European Union within the framework of the EU Open Data Days, seeks to highlight the value of open data and show the opportunities of business models based on it.

Participating teams must create a mobile or web application that responds to challenges related to the priorities of the European Commission, using open data sets.

3 challenges to solve

The available challenges are:

  • Challenge 1: ‘A European Green Deal’. The European Green Deal it is the blueprint to drive a modern, sustainable and competitive European economy. Those who choose this challenge will need to develop applications or services aimed at creating a greener Europe, for example by promoting efficient use of resources or reducing pollution.
  • Challenge 2: ‘An economy that works for people’. In this case, applications or services aimed at companies, public administrations or citizens in general are sought in order to create a fairer economic and monetary union, which allows the growth of the economies of the member countries together with the reduction of poverty and inequality. This category would include, for example, a solution to boost youth job creation.
  • Challenge 3: ‘A Europe fit for the digital age’. The EU aspires to a digital transformation that works for people and businesses. Therefore, this challenge encourages the creation of applications or services that improve data competencies, increase connectivity or make data more understandable for everyone, based on the European Data Strategy.

Each challenge is organized as a separate competition of equal importance.

The solutions presented must combine at least one data set of data.europa.eu (EU Open Data Portal o European Data Portal) with any other publicly available data set.

Who can participate?

It is aimed at citizens around the world who are interested in prototyping products based on public open data and even creating new business models for profit or not for profit through the exploration of such data.

Participation is open to individuals or legal entities, integrated in teams of between one and four members.

Staff working in the institutions, agencies, bodies, partner organizations or contractors of the EU Publications Office cannot register.

How does the competition develop?

Los equipos participantes deben registrar su propuesta siguiendo este formulario antes del 21 de mayo de 2021 a las 23:59 CET (hora central europea). A partir de entonces la competición se desarrollará en 2 fase:

  1. Preselection

All proposals will be evaluated by the jury, made up of experts from within and outside the EU institutions and agencies based on a series of criteria such as the relevance of the selected challenge or the potential and creativity of the proposed solution.

For each challenge, a maximum of three teams that have obtained the highest number of points will be shortlisted. All participating teams will receive notification of the results before June 11, 2021.

  1. Final phase

Shortlisted teams (three per challenge) will be invited to convert their proposals into applications. In addition, each shortlisted team must produce a 60-second video, in which they present their application and the team working on it, before August 1, 2021.

The final of the competition will take place on November 25. The applications presented in each challenge will be evaluated by a jury made up of at least five experts. In addition to the relevance of the proposal, the open data used (economic and social potential, interoperability with other data sources, etc.) and the adequacy of the objective (product maturity, benefits offered, etc.) will be taken into account.

Which is the prize?

In total, 99,000 euros will be awarded (33,000 for each challenge), which will be divided for each challenge as follows:

  • First place: 18,000 euros
  • Second place: 10,000 euros
  • Third place: 5,000 euros

From datos.gob.es we invite you to submit your proposal. Good luck to all participants!

calendar icon
Blog

More than 2.5 billion tonnes. That is the amount of waste that, according to the European Union, is produced every year in the EU - although the specific figure is from 2016 -, with the consequent danger to the environment and our own future. This worrying situation is leading the European institutions to promote a change of model in waste management.

For years, waste management has been linear, i.e. natural resources are extracted, used to produce a certain good, consumed and then discarded. This system requires large amounts of energy and cheap, easily accessible materials. Part of this model are also practices such as programmed obsolescence, very common in the technological field, where the company "programs" the end of the useful life of the product, in such a way that after a certain period of time it stops working.

This linear model is not viable in the long term, which is why more and more people are calling for a change to a management based on the circular economy.

What is the circular economy?

Circular economy refers to an economic system that replaces the concept of 'end-of-life' with 'reduce, reuse, recycle and recover materials in the production, distribution and consumption processes'. In other words, instead of discarding products, they re-enter the production cycle, which contributes to creating environmental quality, economic prosperity and social equity for the benefit of current and future generations.

Through these actions, we can maximise the life cycle of products and minimise waste. When a product ceases to function, its materials can still be used to create new products and remain in the economy wherever possible.

There are many benefits to this practice, from reduced greenhouse gas emissions to financial savings for businesses and consumers, who can benefit from longer-lasting products.

How does open data contribute to fostering the circular economy?

As in other fields, information obtained through open data can help drive better decision-making on the efficient use of resources. Data can help train algorithms to predict certain trends and help citizens, administrations and businesses to implement the necessary measures to ensure a sustainable future.

In its article ‘Open Data and the Circular Economy’, the European Data Portal details 3 areas where open data has a major impact on the circular economy:

  • A more sustainable food system. Open data can help solve logistical problems, improve efficiency and ensure food security. Data on production and distribution, temperature changes of products, rising water levels or mapping of deforestation can improve strategic decision-making to regulate supply and demand across Europe, avoiding over-consumption of resources. In this regard, an example is Smartchain's open data-based research, which aims to develop a shorter and more sustainable food supply chain.
  • Efficient resource management and waste optimisation. The selective collection process and the use of the total capacity of recycling plants can be improved with the right information. In this regard, Santiago City Council has implemented a smart municipal solid waste collection system using IoT technology and machine learning algorithms enriched with open data. In this area, citizen awareness is also fundamental through apps such as EcoCity,  which monitors waste management in cities and sets a series of targets to improve urban recycling habits and reduce waste generation. Users can choose the recycling bin they want to monitor in their neighbourhood. If they detect any incidents with the registered bins, they can send a warning directly to the local council.
  • Pollution reduction. Open data on contamination of the air or our seas helps to raise awareness of pollution and its health risks. This type of information can improve the decision-making process to protect the health of EU citizens and the environment through preventive measures, such as halting the expansion of London Heathrow Airport. Applications and visualisations such as the National Air Index, Aire.cat or this freshwater ecosystem explorer show indicators that raise awareness of the reality of our environment.

How Europe's circular economy is progressing

The European Commission presented last March 2020, in the framework of the European Green Pact, a new Circular Economy Action Plan which includes proposals on designing more sustainable products, reducing waste and empowering citizens (such as the "right to repair").

In addition, in order to effectively and efficiently implement the new sustainable products framework, the Commission is pursuing a number of data actions such as:

  • Establish a common European Green Pact data space for smart applications with data on value chains and product information.
  • Provide harmonised data on concentrations of microplastics in seawater.
  • Cooperate with industry to develop harmonised systems for monitoring and managing information on hazardous substances, in synergy with measures under the sustainable product policy framework and the European Chemicals Agency (ECHA).
  • Encourage the publication of environmental data by companies through the revision of the non-financial reporting directive.
  • Support a business initiative to develop environmental accounting principles that complement financial data with circular economy performance data.
  • In addition, Horizon Europe will support the development of indicators and innovative data, materials and products that help drive the circular economy.

Data actions included in the Circular Economy Action Plan

In our country, the promotion of the circular economy is marked by the Spanish Circular Economy Strategy 2030 (EEEC), whose objectives for 2030 include reducing waste generation by 15% compared to 2010, improving water use efficiency by 10% and reducing greenhouse gas emissions to below 10 million tonnes of CO2 equivalent.

We live in a context of increasing demand for raw materials and scarcity of resources. Many raw materials are finite and, as the world's population increases, so does demand. The circular economy is therefore a key element for the optimal development of the future of the entire population. Within all the initiatives that are already underway, data can play a key role in increasing our knowledge and driving technologies that help all citizens to move towards a sustainable future.


Content prepared by the datos.gob.es team.

calendar icon
Noticia

The open data ecosystem has been very active over the last few months. The year 2020 has ended with two important developments. The first, the Open Data Maturity Report published by the European Data Portal, where Spain has increased its overall position by 5% and remains among the leaders in the European sector.  The second is the new National Strategy for Artificial Intelligence, which includes a series of actions related to open data.

But there are many more new developments that have taken place in recent months. In this article we tell you about some examples, both at national, local and regional level.

State initiatives related to open data

In addition to the publication of the Artificial Intelligence Strategy, in recent weeks the Plan for the Digitalization of Public Administrations 2021-2025 has also been presented, which will mobilize public investment of at least 2,600 million euros over the next three years. Among its points, the plan highlights "the importance of evolving the model of access to public and private information to promote high value-added services". To this end, it will build on the work carried out by the Aporta Initiative in the field of open data.

In addition to these strategic actions, it should be noted that some state agencies have taken advantage of the winter season to launch new projects linked to open data:

  • The Ministry of Transport, Mobility and Urban Agenda has published its mobility study with Big Data that characterizes mobility at national, autonomous community, provincial and local levels during the COVID-19 pandemic. The data generated in the study has been made available to citizens in open data format and has been used to develop a series of indicators.
  • The Ministry of Tourism has launched 'Dataestur', a platform that collects basic data on tourism in Spain and from which you can access the various sources of tourism statistics from public and private organizations.

Local open data initiatives

During these last months, several municipalities and institutions have carried out initiatives related to open data, such as:

  • The open data portal of the Government of Aragon launched a new chatbot service that makes it easier for citizens to access the information available in Aragon Open Data. Thanks to this, Aragonese people can be better informed and make use of the data in a more accessible way.
  • The Community of Murcia creates 'Education in Open Government', a new educational program to bring concepts such as accountability and citizen participation closer to students.
  • The City Council of Santiago de Compostela has developed and built new smart surface collectors for the characterization of organic solid waste, through the use of IoT technologies and algorithms trained with open data. This action allows it to advance on its path towards becoming a smart city.
  • The Government of the Canary Islands launched its new open data portal, which has more than 7,500 data. Thanks to this, this portal has managed to become the single access point with the most public information data registered in all of Spain.
  • The Community of Madrid has announced a new open data strategy to promote and strengthen the transparency of the administration and promote economic development based on knowledge, information and data.
  • The Ronda City Council launches "Geoportal Ronda" a new spatial open data tool through which you can now consult all the geographic and urban information of the municipality.
  • The City Council of Malaga has received recognition at the IDC Awards thanks to a municipal project that values open data and its uses to improve the quality of life of citizens. Specifically, the Consistory has been third in the category of 'Economic development and citizen engagement'.
  • The Vigo City Council has also been awarded in the category 'Planning and administration' thanks to the Smart City VCI+ platform, which centralizes and structures the city's data to create a scorecard that allows a more efficient local management. The platform includes an open data portal so that citizens can consult municipal data in different formats and use them in professional and private environments.

International developments

Acabamos el repaso incluyendo algunos ejemplos de proyectos internacionales ligados con la materia:

We finish the review by including some examples of international projects related to the subject:

  • The European Union is driving the EO4AGRI project, which seeks to use earth observation data from the Copernicus program to digitize the agricultural sector and adapt the CAP to the new times.
  • The European Commission has launched a public consultation to gather feedback on public sector interoperability initiatives in the EU. The information gathered will feed into the evaluation of the European Interoperability Framework (EIF). The deadline for participation is April 27.
  • An international expedition has created a dataset that collects information on the physical and biological dynamics of the Arctic to help better understand climate change. For the time being, this data will be used exclusively until January 1, 2023, when it will be published openly.
  • China has unveiled a public data platform that makes it possible to check emission levels in real time to see which factories and institutions pollute the most.
  • Argentina's Ministry of Tourism and Sports has launched an open data portal using Andino, a platform on top of CKAN.

These are just a few examples that can be found in the world of open data, but there are many more. If you know of any other interesting new developments that are of interest, you can mention them in the comments or send us an email to dinamización@datos.gob.es.

calendar icon
Noticia

Last October, the Aporta Initiative, together with the Secretary of State for Digitalization and Artificial Intelligence and Red.es, launched the third edition of the Aporta Challenge. Under the slogan "The value of data in digital education", the aim was to reward ideas and prototypes capable of identifying new opportunities to capture, analyse and use data intelligence in the development of solutions in the field of education.

Among the proposals submitted in Phase I, there were a wide range of entries. From individuals to university academic teams, educational institutions and private companies, which have devised web platforms, mobile applications and interactive solutions with data analytics and machine learning techniques as protagonists.

A jury of renowned prestige has been in charge of evaluating the proposals submitted based on a series of public criteria. The 10 solutions selected as finalists are:

 

EducaWood

  • Team: Jimena Andrade, Guillermo Vega, Miguel Bote, Juan Ignacio Asensio, Irene Ruano, Felipe Bravo and Cristóbal Ordóñez.

What is it?

EducaWood is a socio-semantic web portal that allows to explore the forest information of an area of the Spanish territory and to enrich it with tree annotations. Teachers can propose environmental learning activities contextualized to their environment. Students carry out these activities during field visits by means of tree annotations (location and identification of species, measurements, microhabitats, photos, etc.) through their mobile devices. In addition, EducaWood allows virtual field visits and remote activities with the available forestry information and annotations generated by the community, thus enabling its use by vulnerable groups and in Covid scenarios.

EducaWood uses sources such as the Spanish Forest Map, the National Forest Inventory or GeoNames, which have been integrated and republished as linked open data. The annotations generated by the students' activities will also be published as linked open data, thus contributing to community benefit.

Data Education. Innovation and Human Rights.

  • Team: María Concepción Catalán, Asociación Innovación y Derechos Humanos (ihr.world).

What is it?

This proposal presents a data education web portal for students and teachers focused on the Sustainable Development Goals (SDGs). Its main objective is to propose to its users different challenges to be solved through the use of data, such as 'What were women doing in Spain in 1920' or 'How much energy is needed to maintain a farm of 200 pigs'.

This initiative uses data from various sources such as the UN, the World Bank, Our World in Data, the European Union and each of its countries. In the case of Spain, it uses data from datos.gob.es and INE, among others.

UniversiDATA-Lab

  • Team: Rey Juan Carlos University, Complutense University of Madrid, Autonomous University of Madrid, Carlos III University of Madrid and DIMETRICAL The Analytics Lab S.L.

What is it?

UniversiDATA-Lab is a public and open portal whose function is to host a catalog of advanced and automatic analyses of the datasets published in the UniversiDATA portal, and which is the result of the collaborative work of universities. It arises as a natural evolution of the current "laboratory" section of UniversiDATA, opening the scope of potential analysis to all present and future datasets/universities, in order to improve the aspects analysed and encourage universities to be laboratories of citizenship, providing a differential value to society.

All the datasets that universities are publishing or will publish in UniversiDATA are potentially usable to carry out in-depth analyses, always considering the respect for the protection of personal data. The specific sources of the analyses will be published on GitHub to encourage the collaboration of other users to contribute improvements.

LocalizARTE

  • Team: Pablo García, Adolfo Ruiz, Miguel Luis Bote, Guillermo Vega, Sergio Serrano, Eduardo Gómez, Yannis Dimitriadis, Alejandra Martínez and Juan Ignacio Asensio.

What is it?

This web application pursues the learning of art history through different educational environments. It allows students to visualize and perform geotagged tasks on a map. Teachers can propose new tasks, which are added to the public repository, as well as select the tasks that may be more interesting for their students and visualize the ones they perform. On the other hand, a mobile version of LocalizARTE will be developed in the future, in which the user will need to be close to the place where the tasks are geotagged in order to perform them.

The open data used in the first version of LocalizARTE comes from the list of historical monuments of Castilla y León, DBpedia, Wikidata, Casual Learn SPARQL and OpenStreetMap.

Study PISA data and datos.gob.es

  • Team: Antonio Benito, Iván Robles and Beatriz Martínez.

What is it?

This project is based on the creation of a dashboard that allows to view information from the PISA report, conducted by the OECD, or other educational assessments along with data provided by datos.gob.es of socioeconomic, demographic, educational or scientific scope. The objective is to detect which aspects favour an increase in academic performance using a machine learning model, so that effective decision-making can be carried out. The idea is that schools themselves can adapt their educational practices and curricula to the learning needs of students to ensure greater success.

This application uses various open data from  INE, the Ministry of Education and Vocational Training or PISA Spain.

Big Data in Secondary Education... and Secondary in Education

  • Team: Carmen Navarro, Nazaret Oporto School.

What is it?

This proposal pursues two objectives: on the one hand, to improve the training of secondary school students in digital skills, such as the control of their digital profiles on the Internet or the use of open data for their work and projects. On the other hand, the use of data generated by students in an e-learning platform such as Moodle to determine patterns and metrics to personalize learning. All of this is aligned with the SDGs and the 20-30 Agenda.

Data used for its development come from the WHO and the datathon "Big Data in the fight against obesity", where several students proposed measures to mitigate global obesity based on the study of public data.

DataLAB: the Data Lab in Education

  • Team: iteNlearning, Ernesto Ferrández Bru.

What is it?

Data obtained with empirical Artificial Intelligence techniques such as big data or machine learning offer correlations, not causes. iteNleanring bases its technology on scientific models with evidence, as well as on data (from sources such as INE or the Basque Institute of Statistics - Eustat). These data are curated in order to assist teachers in decision making, once DataLAB identifies the specific needs of each student.

DataLAB Mathematics is a professional educational tool that, based on neuropsychological and cognitive models, measures the level of neurodevelopment of the specific cognitive processes developed by each student. This generates an educational scorecard that, based on data, informs us of the specific needs of each person (high ability, dyscalculia...) so that they can be enhanced and/or reinforced, allowing an evidence-based education.

The value of podcasting in digital education

  • Team: Adrián Pradilla Pórtoles and Débora Núñez Morales.

What is it?

2020 has been the year in which podcasts have taken off as a new digital format for the consumption of different areas of information. This idea seeks to take advantage of the boom of this tool to use it in the educational field so that students can learn in a more enjoyable and different way.

The proposal includes the official syllabus of secondary or university education, as well as competitive examinations, which can be obtained from open data sources and official websites. Through natural language processing technologies, these syllabi are associated with existing audios of teachers on history, English, philosophy, etc. on platforms such as iVoox or Spotify, resulting in a list of podcasts by course and subject.

The data sources used for this proposal include the Public Employment Offer of Castilla La Mancha or the educational competences in different stages.

MIPs Project

  • Team: Aday Melián Carrillo, Daydream Software.

What is it?

A MIP (Marked Information Picture) is a new interactive information tool, consisting of a series of interactive layers on static images that facilitate the retention of information and the identification of elements.

This project consists of a service for creating MIPs quickly and easily by manually drawing regions of interest on any image imported through the web. The created MIPs will be accessible from any device and have multiple applications as a teaching, personal and professional resource.

In addition to manual creation, the authors have implemented an automatic GeoJSON to MIP data converter in Python. As a first step, they have developed a MIP of Spanish provinces from this public database.

FRISCHLUFT

  • Team: Harut Alepoglian and Benito Cuezva, German School Cultural Association, Zaragoza.

What is it?

The Frischluft (Fresh Air) project is a hardware and software solution for measuring environmental parameters in the school. It aims to improve the thermal comfort of the classrooms and increase the protection of the students through intelligent ventilation, while consolidating a tractor project that drives the digital transformation of the school.

This proposal uses data sources from Zaragoza City Council on CO2 levels in the urban environment of the city and international data repositories to measure global emissions, which are compared through statistical techniques and machine learning models.

Next steps

All of these ideas have been able to capture how to best use data intelligence to develop real solutions in the education sector. The finalists now have 3 months to develop a prototype. The three prototypes that receive the best evaluation from the jury, according to the established evaluation criteria, will be awarded 4,000, 3,000 and 2,000 euros respectively.

Good luck to all participants!

calendar icon
Documentación

In order to extract the full value of data, it is necessary to classify, filter and cross-reference it through analytics processes that help us draw conclusions, turning data into information and knowledge. Traditionally, data analytics is divided into 3 categories:

  • Descriptive analytics, which helps us to understand the current situation, what has happened to get there and why it has happened.
  • Predictive analytics, which aims to anticipate relevant events. In other words, it tells us what is going to happen so that a human being can make a decision.
  • Prescriptive analytics, which provides information on the best decisions based on a series of future scenarios.  In other words, it tells us what to do.

The third report in the "Awareness, Inspire, Action" series focuses on the second stage, Predictive Analytics. It follows the same methodology as the two previous reports on Artificial Intelligence and Natural Language Processing.

Predictive analytics allows us to answer business questions such as: Will we suffer a stockout, will the price of a certain share fall, or will more tourists visit us in the future? Based on this information, companies can define their business strategy, and public bodies can develop policies that respond to the needs of citizens.

After a brief introduction that contextualises the subject matter and explains the methodology, the report, written by Alejandro Alija, is developed as follows:

  • Awareness. The Awareness section explains the key concepts, highlighting the three attributes of predictive analytics: the emphasis on prediction, the business relevance of the resulting knowledge and its trend towards democratisation to extend its use beyond specialist users and data scientists. This section also mentions the mathematical models it makes use of and details some of its most important milestones throughout history, such as the Kyoto protocol or its usefulness in detecting customer leakage.
  • Inspire. The Inspire section analyses some of the most relevant use cases of predictive analytics today in three very different sectors. It starts with the industrial sector, explaining how predictive maintenance and anomaly detection works. It continues with examples relating to price and demand prediction, in the distribution chain of a supermarket and in the energy sector. Finally, it ends with the health sector and augmented medical imaging diagnostics.
  • Action. In the Action section, a concrete use case is developed in a practical way, using real data and technological tools. In this case, the selected dataset is traffic accidents in the city of Madrid, published by the Madrid City Council. Through the methodology shown in the following figure, it is explained in a simple way how to use time series analysis techniques to model and predict the number of accidents in future months.

The report ends with the Last stop section, where courses, books and articles of interest are compiled for those users who want to continue advancing in the subject.

In this video, the author tells you more about the report and predictive analytics (only available in Spanish).

Below, you can download the full report in pdf and word (reusable version), as well as access the code used in the Action example at this link.

calendar icon
Documentación

The pandemic that originated last year has brought about a significant change in the way we see the world and how we relate to it. As far as the education sector is concerned, students and teachers at all levels have been forced to change the face-to-face teaching and learning methodology for a telematic system.

In this context, within the framework of the Aporta Initiative, the study "Data-based educational technology to improve learning in the classroom and at home", by José Luis Marín, has been developed. This report offers several keys to reflect on the new challenges posed by this situation, which can be turned into opportunities if we manage to introduce changes that promote the improvement of the teaching-learning process beyond simply replacing face-to-face classes with online training.

The importance of data to improve the education sector

Through innovative educational technology based on data and artificial intelligence, some of the challenges facing the education system can be addressed. For this report, 4 of these challenges have been selected:

·      Non-presential supervision of assessment tests: monitoring and surveillance of evaluative tests through telematic resources.

·      Identification of behavioral or attention problems: alerting teachers to activities and behaviors that indicate attention, motivation or behavioral problems.

·      Personalized and more attractive training programs: adaptation of learning routes and pace of students' learning.

·      Improved performance on standardized tests: use of online learning platforms to improve results on standardized tests, to reinforce mastery of a particular subject, and to achieve fairer and more equitable assessment.

To address each of these four challenges, a simple structure divided into three sections is proposed:

1.     Description of the problem, which allows us to put the challenge in context.

2.     Analysis of some of the approaches based on the use of data and artificial intelligence that are used to offer a technological solution to the challenge in question.

3.     Examples of relevant or highly innovative solutions or experiences.

The report also highlights the enormous complexity involved in this type of issues, so they should be approached with caution to avoid negative consequences on individuals, such as cybersecurity issues, invasion of privacy or risk of exclusion of some groups, among others. To this end, the document ends with a series of conclusions that converge in the idea that the best way to generate better results for all students, alleviating inequalities, is to combine excellent teachers and excellent technology that enhances their capabilities. In this process, open data can play an even more relevant role in improving the state of the art in educational technology and ensuring more widespread access to certain innovations that are largely based on machine learning or artificial intelligence technologies.

In this video, the author tells us more about the report:

calendar icon
Noticia

The year 2020 closed with the announcement of the winning projects of two competitions that sought to promote the reuse of open data in two Autonomous Communities: Castilla y León and Euskadi.

Winning projects of the 4th Castilla y León Government Open Data Competition

As in other years, the latest edition of the Junta de Castilla y León's open data competition was aimed at supporting and recognising projects that provide any type of idea, study, service, website or application for mobile devices, using the datasets of the Junta de Castilla y León's Open Data Portal.

A total of eight projects were awarded, divided into different categories as follows:

Ideas category

This category is aimed at those participants who have a great idea even though they do not have the technical capacity, time or resources to implement it.

  • The first prize was awarded to Cristina Pérez Fernández and César González Palomo. Their project Castilla y León en remotely presents a search engine for "the ideal population" for those professionals who work remotely and wish to move to a place that will help them to satisfy their desire for a change in lifestyle. Based on the personal preferences of each user, the search engine offers the possibilities that best suit their needs and desires. To do so, it exploits data such as the availability of 4G and/or fibre optic coverage, number of inhabitants in the "ideal population", distance to places of interest, cultural activity, natural environment or average rent/purchase price per square metre.
  • The second prize went to Juan Carlos Solís Méndez for his CyL Mobility project. They have put their idea into practice through a first version of a website that brings together all the information collected on the establishments in the autonomous community that are adapted for people with reduced mobility. This is, without a doubt, an idea with a clear social value as it favours the improvement of the quality of life of a vulnerable group.

Products and services category

In this category, they were looking for projects that provided studies, services, websites or applications for mobile devices, that used datasets from the Junta de Castilla y León's Open Data Portal and that were accessible to all citizens via the web by means of a URL.

  • The first prize went to the Escovid19data project, a collaborative collection of visualisations and reusable data from COVID-19 by regions in Spain. This project helps to improve the data and information published by the Administration and encourages citizens to become more aware of the serious problem we face.
  • The second prize went to the Castilla y León Gurú project, an assistant based on a conversational bot in Telegram that offers users tourist, cultural and leisure information on the community.
  • The student award went to TurisCyL, an app for Android mobiles that serves as a travel guide for the autonomous community by offering as much information as possible about tourist sites (restaurants, accommodation, etc.), as well as museums or cultural events.

Didactic Resource Category

Within this section, the creation of innovative open educational resources (with Creative Commons licenses), which support teaching, is encouraged.

  • The prize in this category went to the Casual Learn project, an application for Android mobile devices that allows people to learn about art history from buildings and public spaces in Castilla y León. The app suggests learning tasks considering the interests and location of the user. For example, if the user passes near a Gothic church, Casual Learn can suggest taking a photo of its facade and comparing it with that of a nearby Romanesque church.

Data Journalism Category

This category rewards relevant journalistic pieces published or updated in any medium, whether written or audiovisual.

Winners of the Basque Country 2020 Open Data Ideas and Applications Competition

Another of the challenges resolved at the end of the year were the Basque Country Open Data Ideas and Applications competitions, with the aim of publicising and promoting the reuse of open data in the region, organised by the Basque Government together with the Provincial Councils of Alava, Bizkaia and Gipuzkoa and the City Councils of Bilbao, Donostia-San Sebastian and Vitoria-Gasteiz.

Winning projects of the Basque Country Open Data Ideas Competition 2020

The Ideas Competition is aimed at both individuals and companies who wish to submit "ideas for creating products or services derived from open data from the main public data catalogues in the Basque Country". The third edition admitted 30 candidatures, of which two were awarded prizes:

  • The first prize in this category went to the Basque Country Seasonal Pollen Forecasting Service project, by Ortzi Torices Roldán and Hodei Goncalves Barkaiztegi. This is a proposal to create a neuronal network to forecast pollen levels in the Basque Country and to offer a public service for people suffering from allergies and respiratory conditions.
  • The second prize was awarded to the Ongi etorri Euskal Herrira project, by Iker Díez Arancibia and Alberto Nieto de Pablos. This project proposes an application based on the generation of plans that bring together the types of activities desired by each tourist in a limited geographical area. It offers tourists a graphic representation of the different plans they request and allows for the joint booking of the activities that make up the plan.

Winning projects of the Basque Country Open Data Applications Competition 2020

For its part, the Applications Competition is aimed at both individuals and companies that have created or wish to create "products or services derived from open data from the main public data catalogues in the Basque Country". Of the 28 candidatures admitted, the following have been awarded prizes:

  • The first prize was awarded to Smart Public Tender, by Manuel José García Rodríguez, a web platform that includes the latest innovations in the field of public procurement and which helps both public administrations and tendering companies in their decision-making using Machine Learning methodology.
  • The second prize was awarded to AvatarParking, by Unai Antero Urruticoechea and Beatriz Arenal Redondo. It is an application that is designed as an assistant for car parks in San Sebastian. By accessing the user's location in real time, it indicates the nearest car park, the number of free spaces available, possible incidents on the way, as well as an estimate of the cost of leaving the car there. The application is designed to be actively carried on the mobile phone and to receive information and commands by voice, thus avoiding distractions with the mobile phone while driving.

Congratulations to all the winners!

calendar icon
Blog

Open data is an increasingly used resource for the training of students at different stages of the education system and for the continuous training of professionals from all sectors. There is already little doubt about the growing importance of all the skills related to data analysis and processing in relation to almost any knowledge discipline. Similarly, skills related to visualization and the construction of stories based on the conclusions drawn from any data analysis or modeling are increasingly needed to complement and extend the ever-necessary skills to communicate and present results of any kind of work.

Throughout the process of training professionals related to data science and artificial intelligence, open data is a valuable resource for gaining practical experience with the techniques and tools that are common in the profession. However, the effects that the use of data, usually open, has on the learning of other subjects, on the acquisition of other types of skills and even on the motivation of students towards learning, are also beginning to be appreciated.

As early as 2013, research that conducted a detailed quantitative comparison of different educational approaches adopted by 39 schools in New York City showed that the use of data to guide the educational programme was one of the five main policies that had an effect on improving academic performance.

Although the use of open data in the classroom has not been widely studied, the limited research conducted so far suggests at first glance that there is a lack of awareness of open data among educators. While we do not have a consolidated understanding of the effect of the use of open data in educational settings as it is not currently a widespread educational resource, there does appear to be a set of early adopter educators who make substantial use of open data in their teaching programmes.

The research "The use of open data as a material for learning" by the Institute of Educational Technology is based on the qualitative analysis of the experience of a group of these pioneering educators to draw a number of conclusions about the value of using open data in teaching.

One of the starting points is that open data does not seem to offer completely new educational or pedagogical methodologies, but rather its use complements existing concepts of teaching and learning such as research-based or project-based learning or personalisation of learning. Two conclusions stand out in this respect:

  1. Open datasets used as part of learning projects in any subject are usually relevant to the learner, either because they describe issues in their geographical or social environment, or because they relate to current issues or their own hobbies. Research shows that the mere use of these datasets that arouse student curiosity during the learning of any concept has positive effects on the motivation of students to go deeper into the subject and appreciate its usefulness.
  1. The use of open datasets offers the possibility to propose more advanced activities without increasing the difficulty of the training programme. Examples are cited in the research ranging from the use of open data to support the statistics training of high school students to the use of open scientific databases in the area of genomics to support the teaching of bioinformatics concepts. In this way students can acquire more advanced knowledge and skills that would otherwise probably only have been produced in the field of professional activity or would have been discarded due to insufficient time in the programme. This effect, especially at higher education levels, would also contribute to closing the gap between the education system and professional practice.

Although their effect has not yet been studied, open data competitions are another vehicle for channelling the practical training of students and for creating new educational resources. Increasingly, universities or secondary schools are encouraging teams to participate in regional or national open data competitions as an activity within certain subjects. Some competitions, such as the Castilla y León open data competition, even have a special category with a corresponding prize reserved for the participation of students.

Along the same lines, Barcelona City Council has been organising the Barcelona Dades Obertes Challenge for four years, which aims to bring the benefits of open data closer to the public and to promote its use in the city's educational centres. The challenge combines competition between schools, which have to develop a data-based project, with a specific training plan on open data for teachers, so that they can guide their students.

The fact that there is no more widespread use of open datasets in educational programmes can be attributed to factors such as the lack of teacher training or the difficulty in adapting existing data. Most open datasets come from professional environments such as scientific research or public service administration and learners and educators may not have the literacy or resources to take advantage of them even though tools are emerging that simplify some of the complexities of working with open data. In this regard, a stronger relationship and joint work between educators and learners and dataset producers could also encourage the deployment of more learning programmes.

This is why there are interesting initiatives such as UDIT (Use Open Research Data In Teaching) launched in 2017 with the aim of encouraging and helping higher education teachers to incorporate open research data and other open science concepts into their teaching to improve the learning process.

The International Open Data Charter already recognises the importance of engaging "with schools and higher education institutions to support further open data research and to incorporate data literacy into education programmes". The value of open data in the learning process has not yet been sufficiently explored. As an example, the usual discourse of the open data community always highlights the potential economic and social value of reuse, but not so much the potential of its use in education.


Content prepared by Jose Luis Marín, Senior Consultant in Data, Strategy, Innovation & Digitalization.

The contents and points of view reflected in this publication are the sole responsibility of its author.

calendar icon