Noticia

The rise of smart cities, the distribution of resources during pandemics or the fight against natural disasters has awakened interest in geographic data. In the same way that open data in the healthcare field helps to implement social improvements related to the diagnosis of diseases or the reduction of waiting lists, Geographic Information Systems help to streamline and simplify some of the challenges of the future, with the aim of making them more environmentally sustainable, more energy efficient and more livable for citizens. 

As in other fields, professionals dedicated to optimizing Geographic Information Systems (GIS) also build their own working groups, associations and training communities. GIS communities are groups of volunteers interested in using geographic information to maximize the social benefits that this type of data can bring in collective terms.  

Thus, by addressing the different approaches offered by the field of geographic information, data communities work on the development of applications, the analysis of geospatial information, the generation of cartographies and the creation of informative content, among others.  

In the following lines, we will analyze step by step what is the commitment and objective of three examples of GIS communities that are currently active. 

Gis and Beers 

What is it and what is its objective? 

Gis and Beers is an association focused on the dissemination, analysis and design of tools linked to geographic information and cartographic data. Specialized in sustainability and environment, they use open data to propose and disseminate solutions that seek to design a sustainable and nature-friendly environment. 

What functions does it perform? 

In addition to disseminating specialized content such as reports and data analysis, the members of Gis and Beers offer training resources dedicated to facilitating the understanding of geographic information systems from an environmental perspective. It is common to read articles on their website focused on new environmental data or watch tutorials on how to access open data platforms specialized in the environment or the tools available for their management. Likewise, every time they detect the publication of a new open data catalog, they share on their website the necessary instructions for downloading the data, managing it and representing it cartographically. 

Next steps  

In line with the environmental awareness that marks the project, Gis and Beers is devoting more and more effort to strengthening two key pillars for its content: raising awareness of the importance of citizen science (a collaborative movement that provides data observed by citizens) and promoting access to data that facilitate modeling without previously adapting them to cartographic analysis needs. 

The role of open data 

The origin of most of the open data they use comes from state sources such as the IIGN, Aemet or INE, although they also draw on other options such as those offered by Google Earth Engine and Google Public Data.  

How to contact them? 

If you are interested in learning more about the work of this community or need to contact Gis and Beers, you can visit their website or write directly to this email account.  

Geovoluntarios 

What is it and what is its objective? 

It is a non-profit Organization formed by professionals experienced in the use and remote application of geospatial technology and whose objective is to cooperate with other organizations that provide support in emergency situations and in projects aligned with the Sustainable Development Goals. 

The association's main objectives are: 

  • To provide help to organizations in any of the phases of an emergency, prioritizing help to non-profit, life-saving organizations or those supporting the third sector. Some of them are Red Cross, Civil Protection, humanitarian organizations, etc. 
  • Encourage digital volunteering among people with knowledge or interest in geospatial technologies and working with geolocated data. 
  • Find ways to support organizations working towards the Sustainable Development Goals (SDGs). 
  • Provide geospatial tools and geolocated data to non-profit projects that would otherwise not be technically or economically feasible. 

What functions does it perform? 

The professional experience accumulated by the members of geovolunteers allows them to offer support in tasks related to the analysis of geographic data, the design of models or the monitoring of special emergency situations. Thus, the most common functions carried out as an NGO can be summarized as follows: 

  • Training and providing means to volunteers and organizations in all the necessary aspects to provide aid with guarantees: geographic information systems, spatial analysis, RGPD, security, etc. 
  • Facilitate the creation of temporary work teams to respond to requests for assistance received and that are in line with the organization's goals. 
  • Create working groups that maintain data that serve a general purpose. 
  • Seek collaboration agreements with other entities, organize and participate in events and carry out campaigns to promote digital volunteering.

From a more specific point of view, among all the projects in which Geovolunteers has participated, two initiatives in which the members were particularly involved are worth mentioning. On the one hand, the Covid data project, where a community of digital volunteers committed to the search and analysis of reliable data was created to provide quality information on the situation being experienced in each of the different autonomous communities of Spain. Another initiative to highlight was Reactiva Madrid, an event organized by the Madrid City Council and Esri Spain, which was created to identify and develop work that, through citizen participation, would help to prevent and/or solve problems related to the pandemic caused by COVID-19 in the areas of the economy, mobility and society. 

Next steps 

After two years focused on solving part of the problems generated by the Covid-19 crisis, Geovolunteers continues to focus on collaborating with organizations that are committed to assisting the most vulnerable people in emergency situations, without forgetting the commitment that links them to meeting the Sustainable Development Goals.  

Thus, one of the projects in which the volunteers are most active is the implementation and improvement of GeoObs, an app to geolocate different observation projects on: dirty spots, fire danger, dangerous areas for bikers, improving a city, safe cycling, etc. 

The role of open data 

For an NGO like Geovolunteers, open data is essential both to develop the solidarity tasks they carry out together with other associations, as well as to design their own services and applications. Hence, these resources are part of the new functionalities on which the Association wants to focus.  

So much so that data collection marks a starting point for the pilot projects that can currently be found under the Geovolunteers umbrella. Without going any further, the application mentioned above is an example that demonstrates how generating data by observation can contribute to enriching the available open data catalogs. 

GIS Community 

What is it and what is its objective? 

GIS Community is a virtual collective that brings together professionals in the field of geographic data and information systems related to the same sector. Founded in 2009, they disseminate their work through social networks such as Facebook, Twitter or Instagram from where, in addition, they share news and relevant information on geotechnology, geoprocessing or land use planning among other topics. 

Its objective is none other than to contribute to expand the informative and interesting knowledge for the geographic data community, a virtual space with little presence when this project began its work on the Internet.  

What functions does it perform? 

In line with the objectives mentioned above, the tasks developed by SIG are focused on the sharing and generation of content related to Geographic Information Systems. Given the diversity of fields and sectors of action within the same field, they try to balance the content of their publications to bring together both those who seek information and those who provide opportunities. For this reason it is possible to find news about events, training, research projects, news about entrepreneurs or literature among many others. 

Next steps 

Aware of the weight they have as a community within the field of geographic data, from SIG they plan to strengthen four axes that directly affect the work of the project: organize lectures and webinars, contact organizations and institutions capable of providing funding for projects in the GIS area, seek entities that provide open geospatial information and, finally, get part of the private sector to participate financially in the education and training of professionals in the field of GIS.  

The role of open data 

This is a community that is closely linked to the universe of open data, because it shares content that can be used, supplemented and redistributed freely by users. In fact, according to its own members, there is an increasing acceptance and preference for this trend, with community collaborators and their own projects driving the debate and interest in using open data in all active phases of their tasks or activities. 

How to contact them? 

As in the previous cases, if you are interested in contacting Comunidad SIG you can do so through their Facebook page, Twitter or Instagram or by sending an email to the following email.  

Communities like Gis and Beers, SIG or Geovolunteers are just a small example of the work that the GIS collective is currently developing. If you are part of any data community in this or any other field or you know about the work of communities that may be of interest in datos.gob.es, do not hesitate to send us an email to dinamizacion@datos.gob.es

Geo Developers

What is it and what is its purpose?

Geodevelopers is a community whose objective is to bring together developers and surveyors in the field of geographic data. The main function of this community is to share different professional experiences related to geographic data and, for this purpose, they organize talks where everyone can share their experience and knowledge with the rest.

Through their YouTube channel it is possible to access the trainings and talks held to date, as well as to be aware of the next ones that may be held.

The role of open data

Although this is not a community focused on the reuse of open data as such, they use it to develop some projects and extract new learnings that they then incorporate into their workflows.

Next steps and contact

The main objective for the future of Geodevelopers is to grow the community in order to continue sharing experiences and knowledge with the rest of the GIS stakeholders. If you want to get in touch and follow the evolution of this project you can do it through its Twitter profile.

calendar icon
Blog

According to the latest analysis conducted by Gartner in September 2021, on Artificial Intelligence trends, Chatbots are one of the technologies that are closest to deliver effective productivity in less than 2 years. Figure 1, extracted from this report, shows that there are 4 technologies that are well past the peak of inflated expectations and are already starting to move out of the valley of disillusionment, towards states of greater maturity and stability, including chatbots, semantic search, machine vision and autonomous vehicles.

Figure 1-Trends in AI for the coming years.

In the specific case of chatbots, there are great expectations for productivity in the coming years thanks to the maturity of the different platforms available, both in Cloud Computing options and in open source projects, especially RASA or Xatkit. Currently it is relatively easy to develop a chatbot or virtual assistant without AI knowledge, using these platforms.

How does a chatbot work?

As an example, Figure 2 shows a diagram of the different components that a chatbot usually includes, in this case focused on the architecture of the RASA project.

Figure 2- RASA project architecture

One of the main components is the agent module, which acts as a controller of the data flow and is normally the system interface with the different input/output channels offered to users, such as chat applications, social networks, web or mobile applications, etc.

The NLU (Natural Languge Understanding) module is responsible for identifying the user's intention (what he/she wants to consult or do), entity extraction (what he/she is talking about) and response generation. It is considered a pipeline because several processes of different complexity are involved, in many cases even through the use of pre-trained Artificial Intelligence models.

Finally, the dialogue policies module defines the next step in a conversation, based on context and message history. This module is integrated with other subsystems such as the conversation store (tracker store) or the server that processes the actions necessary to respond to the user (action server).

Chatbots in open data portals as a mechanism to locate data and access information

There are more and more initiatives to empower citizens to consult open data through the use of chatbots, using natural language interfaces, thus increasing the net value offered by such data. The use of chatbots makes it possible to automate data collection based on interaction with the user and to respond in a simple, natural and fluid way, allowing the democratization of the value of open data.

At SOM Research Lab (Universitat Oberta de Catalunya) they were pioneers in the application of chatbots to improve citizens' access to open data through the Open Data for All and BODI (Bots to interact with open data - Conversational interfaces to facilitate access to public data) projects. You can find more information about the latter project in this article.

It is also worth mentioning the Aragón Open Data chatbot, from the open data portal of the Government of Aragón, which aims to bring the large amount of data available to citizens, so that they can take advantage of its information and value, avoiding any technical or knowledge barrier between the query made and the existing open data. The domains on which it offers information are: 

  • General information about Aragon and its territory
  • Tourism and travel in Aragon
  • Transportation and agriculture
  • Technical assistance or frequently asked questions about the information society.

Conclusions

These are just a few examples of the practical use of chatbots in the valorization of open data and their potential in the short term. In the coming years we will see more and more examples of virtual assistants in different scenarios, both in the field of public administrations and in private services, especially focused on improving user service in e-commerce applications and services arising from digital transformation initiatives.


Content prepared by José Barranquero, expert in Data Science and Quantum Computing.

The contents and points of view reflected in this publication are the sole responsibility of the author.

calendar icon
Evento

The pandemic situation we have experienced in recent years has led to a large number of events being held online. This was the case of the Iberian Conference on Spatial Data Infrastructures (JIIDE), whose 2020 and 2021 editions had a virtual format. However, the situation has changed and in 2022 we will be able to meet again to discuss the latest trends in geographic information.

Seville will host JIIDE 2022

Seville has been the city chosen to bring together all those professionals from the public administration, private sector and academia interested in geographic information and who use Spatial Data Infrastructures (SDI) in the exercise of their activities.

Specifically, the event will take place from 25 to 27 October at the University of Seville. You can find more information here.

Focus on user experience

This year's slogan is "Experience and technological evolution: bringing the SDI closer to citizens".  The aim is to emphasise new technological trends and their use to provide citizens with solutions that solve specific problems, through the publication and processing of geographic information in a standardised, interoperable and open way.

Over three days, attendees will be able to share experiences and use cases on how to use Big Data, Artificial Intelligence and Cloud Computing techniques to improve the analysis capacity, storage and web publication of large volumes of data from various sources, including real-time sensors.

New specifications and standards that have emerged will also be discussed, as well as the ongoing evaluation of the INSPIRE Directive.

Agenda now available

Although some participations are still to be confirmed, the programme is already available on the conference website. There will be around 80 communications where experiences related to real projects will be presented, 7 technical workshops where specific knowledge will be shared and a round table to promote debate.

Among the presentations there are some focused on open data. This is the case of Valencia City Council, which will talk about how they use open data to obtain environmental equity in the city's neighbourhoods, or the session dedicated to the "Digital aerial photo library of Andalusia: a project for the convergence of SDIs and Open-Data".

How can I attend?

The event is free of charge, but to attend you need to register using this form. You must indicate the day you wish to attend.

For the moment, registration is open to attend in person, but in September, the website of the conference will offer the possibility of participating in the JIIDE virtually.

Organisers

The Jornadas Ibéricas de Infraestructuras de Datos Espaciales (JIIDE) were born from the collaboration of the Directorate General of Territory of Portugal, the National Geographic Institute of Spain and the Government of Andorra. On this occasion, the Institute of Statistics and Cartography of Andalusia and the University of Seville join as organisers.

 

calendar icon
Empresa reutilizadora

KSNET (Knowledge Sharing Network S.L) is a company dedicated to the transfer of knowledge that aims to improve programmes and policies with both a social and economic impact. That is why they accompany their clients throughout the process of creating these programmes, from the diagnosis, design and implementation phase to the evaluation of the results and impact achieved, also providing a vision of the future based on proposals for improvement.

calendar icon
Empresa reutilizadora

Estudio Alfa is a technology company dedicated to offering services that promote the image of companies and brands on the Internet, including the development of apps. To carry out these services, they use techniques and strategies that comply with usability standards and favour positioning in search engines, thus helping their clients' websites to receive more visitors and thus potential clients. They also have special experience in the production and tourism sectors.

 

calendar icon
Entrevista

A few months ago, Facebook surprised us all with a name change: it became Meta. This change alludes to the concept of "metaverse" that the brand wants to develop, uniting the real and virtual worlds, connecting people and communities.

Among the initiatives within Meta is Data for Good, which focuses on sharing data while preserving people's privacy. Helene Verbrugghe, Public Policy Manager for Spain and Portugal at Meta spoke to datos.gob.es to tell us more about data sharing and its usefulness for the advancement of the economy and society.

Full interview:

1. What types of data are provided through the Data for Good Initiative?

Meta's Data For Good team offers a range of tools including maps, surveys and data to support our 600 or so partners around the world, ranging from large UN institutions such as UNICEF and the World Health Organization, to local universities in Spain such as the Universitat Poliècnica de Catalunya and the University of Valencia.

To support the international response to COVID-19, data such as those included in our Range of Motion Maps have been used extensively to measure the effectiveness of stay-at-home measures, and in our COVID-19 Trends and Impact Survey to understand issues such as reluctance to vaccinate and inform outreach campaigns. Other tools, such as our high-resolution population density maps, have been used to develop rural electrification plans and five-year water and sanitation investments in places such as Rwanda and Zambia. We also have AI-based poverty maps that have helped extend social protection in Togo and an international social connectivity index that has been useful for understanding cross-border trade and financial flows. Finally, we have recently worked to support groups such as the International Federation of the Red Cross and the International Organization for Migration in their response to the Ukraine crisis, providing aggregated information on the volumes of people leaving the country and arriving in places such as Poland, Germany and the Czech Republic.    

Privacy is built into all our products by default; we aggregate and de-identify information from Meta platforms, and we do not share anyone's personal information.

 

2. What is the value for citizens and businesses? Why is it important for private companies to share their data?

Decision-making, especially in public policy, requires information that is as accurate as possible. As more people connect and share content online, Meta provides a unique window into the world. The reach of Facebook's platform across billions of people worldwide allows us to help fill key data gaps. For example, Meta is uniquely positioned to understand what people need in the first hours of a disaster or in the public conversation around a health crisis - information that is crucial for decision-making but was previously unavailable or too expensive to collect in time.

For example, to support the response to the crisis in Ukraine, we can provide up-to-date information on population changes in neighbouring countries in near real-time, faster than other estimates. We can also collect data at scale by promoting Facebook surveys such as our COVID-19 Trends and Impact Survey, which has been used to better understand how mask-wearing behaviour will affect transmission in 200 countries and territories around the world.  

3. The information shared through Data for Good is anonymised, but what is the process like? How is the security and privacy of user data guaranteed?

Data For Good respects the choices of Facebook users. For example, all Data For Good surveys are completely voluntary. For location data used for Data For Good maps, users can choose whether they want to share that information from their location history settings. 

We also strive to share how we protect privacy by publishing blogs about our methods and approaches. For example, you can read about our differential privacy approach to protecting mobility data used in the response to COVID-19 here.

4. What other challenges have you encountered in setting up an initiative of this kind and how have you overcome them?

When we started Data For Good, the vast majority of our datasets were only available through a licensing agreement, which was a cumbersome process for some partners and unfeasible for many governments. However, at the onset of the COVID-19 pandemic, we realised that, in order to operate at scale, we would need to make more of our work publicly available, while incorporating stringent measures, such as differential privacy, to ensure security. In recent years, most of our datasets have been made public on platforms such as the Humanitarian Data Exchange, and through this tool and other APIs, our public tools have been queried more than 55 million times in the past year. We are proud of the move towards open source sharing, which has helped us overcome early difficulties in scaling up and meeting the demand for our data from partners around the world.

5. What are Meta's future plans for Data for Good?

Our goal is to continue to help our partners get the most out of our tools, while continuing to evolve and create new ways to help solve real-world problems. In the past year, we have focused on growing our toolkit to respond to issues such as climate change through initiatives such as our Climate Change Opinion Survey, which will be expanded this year; as well as evolving our knowledge of cross-border population flows, which is proving critical in supporting the response to the crisis in Ukraine.

 

calendar icon
Documentación

It is important to publish open data following a series of guidelines that facilitate its reuse, including the use of common schemas, such as standard formats, ontologies and vocabularies. In this way, datasets published by different organizations will be more homogeneous and users will be able to extract value more easily.

One of the most recommended families of formats for publishing open data is RDF (Resource Description Framework). It is a standard web data interchange model recommended by the World Wide Web Consortium, and highlighted in the F.A.I.R. principles or the five-star schema for open data publishing.

RDFs are the foundation of the semantic web, as they allow representing relationships between entities, properties and values, forming graphs. In this way, data and metadata are automatically interconnected, generating a network of linked data that facilitates their exploitation by reusers. This also requires the use of agreed data schemas (vocabularies or ontologies), with common definitions to avoid misunderstandings or ambiguities.

In order to promote the use of this model, from datos.gob.es we provide users with the "Practical guide for the publication of linked data", prepared in collaboration with the Ontology Engineering Group team - Artificial Intelligence Department, ETSI Informáticos, Polytechnic University of Madrid-.

The guide highlights a series of best practices, tips and workflows for the creation of RDF datasets from tabular data, in an efficient and sustainable way over time.

Who is the guide aimed at?

The guide is aimed at those responsible for open data portals and those preparing data for publication on such portals. No prior knowledge of RDF, vocabularies or ontologies is required, although a technical background in XML, YAML, SQL and a scripting language such as Python is recommended.

What does the guide include?

After a short introduction, some necessary theoretical concepts (triples, URIs, controlled vocabularies by domain, etc.) are addressed, while explaining how information is organized in an RDF or how naming strategies work.

Next, the steps to be followed to transform a CSV data file, which is the most common in open data portals, into a normalized RDF dataset based on the use of controlled vocabularies and enriched with external data that enhance the context information of the starting data are described in detail. These steps are as follows:

Steps to follow to transform CSV data to RDF. Step 1: Selection of controlled vocabulary for the domain. Step 2: Cleaning and preparation of CSV data. Step 3: Construction of transformation rules (mappings). Step 4: Generation of RDF data from the rules. Source: Practical guide for the publication of linked data. datos.gob.es.

The guide ends with a section oriented to more technical profiles that implements an example of the use of RDF data generated using some of the most common programming libraries and databases for storing triples to exploit RDF data.

Additional materials

The practical guide for publishing linked data is complemented by a cheatsheet that summarizes the most important information in the guide and a series of videos that help to understand the set of steps carried out for the transformation of CSV files into RDF. The videos are grouped in two series that relate to the steps explained in the practical guide:

1) Series of explanatory videos for the preparation of CSV data using OpenRefine. This series explains the steps to be taken to prepare a CSV file for its subsequent transformation into RDF:

  • Video 1: Pre-loading tabular data and creating an OpenRefine project.
  • Video 2: Modifying column values with transformation functions.
  • Video 3: Generating values for controlled lists or SKOS.
  • Video 4: Linking values with external sources (Wikidata) and downloading the file with the new modifications.

2) Series of explanatory videos for the construction of transformation rules or CSV to RDF mappings.  This series explains the steps to be taken to transform a CSV file into RDF by applying transformation rules.

  • Video 1: Downloading the basic template for the creation of transformation rules and creating the skeleton of the transformation rules document.
  • Video 2: Specifying the references for each property and how to add the Wikidata reconciled values obtained through OpenRefine.

Below you can download the complete guide, as well as the cheatsheet. To watch the videos you must visit our Youtube channel.

calendar icon
Entrevista

Google is a company with a strong commitment to open data. It has launched Google Dataset Search, to locate open data in existing repositories around the world, and also offers its own datasets in open format as part of its Google Research initiative. In addition, it is a reuser of open data in solutions such as Google Earth.

Among its areas of work is Google for Education, with solutions designed for teachers and students. In datos.gob.es we have interviewed Gonzalo Romero, director of Google for Education in Spain and member of the jury in charge of evaluating the proposals received in the III edition of Desafío Aporta. Gonzalo talked to us about his experience, the influence of open data in the education sector and the importance of open data.

Full interview:

1. What challenges does the education sector face in Spain and how can open data and data-driven technologies help to overcome them?

Last year, due to the pandemic, the education sector was forced to accelerate its digitalization process so that the activity could develop as normally as possible.

The main challenges facing the education sector in Spain are technology and digitization as this sector is less digitized than average. Secure, simple and sustainable digital tools are needed so that the education system, from teachers and students to administrators, can operate easily and without any problems.

Open data makes it possible to locate certain quality information from thousands of sources quickly and easily at any time. These repositories create a reliable data sharing ecosystem that encourages publishers to publish data to drive student learning and the development of technology solutions.

2. Which datasets are most in demand for implementing educational solutions?

Each region usually generates its own. The main challenge is how new datasets can be created in collaboration with the variables that allow them to create predictive models to anticipate the main challenges they face, such as school dropout, personalization of learning or academic and professional orientation, among others.

3. How can initiatives such as hackathons or challenges help drive data-driven innovation? How was your experience in the III Aporta Challenge?

It is essential to support projects and initiatives that develop innovative solutions to promote the use of data.

Technology offers tools that help to find synergies between public and private data to develop technological solutions and promote different skills among students.

4. In addition to being the basis for technological solutions, open data also plays an important role as an educational resource in its own right, as it can provide knowledge in multiple areas. To what extent does this type of resource foster critical thinking in students?

The use of open data in the classroom is a way to boost and foster students' educational skills. For a good use of these resources it is important to search and filter the information according to the needs, as well as to improve the ability to analyse data and argumentation in a reasoned way. In addition, it allows the student to manage technological programs and tools.

These skills are useful for the future not only academically but also in the labour market, since more and more professionals with skills related to analytical capacity and data management are in demand.

5. Through your Google Research initiative, multiple projects are being carried out, some of them linked to the opening and reuse of open data. Why is it important that private companies also open data?

We understand the difficulties that private companies may have if they share data since sharing their information can be an advantage for competitors. However, it is essential to combine public and private sector data to drive the growth of the open data market that can lead to new analyses and studies and the development of new products and services.

It is also important to approach data reuse in the light of new and emerging social challenges and to facilitate the development of solutions without having to start from scratch.

6.What are Google's future plans for open data?

Sensitive corporate data has high survivability requirements, in case a provider has to cancel cloud services due to policy changes in a country or region, and we believe it is not possible to secure data with a proprietary solution. However, we do have open source and open standards tools that address multiple customer concerns.

Data analysis tools such as BigQuery or BigQuery Omni allow customers to make their own data more open, both inside and outside their organization. The potential of that data can then be harnessed in a secure and cost-efficient way. We already have clear use cases of value created with our data and artificial intelligence technology, and endorsed by the CDTI, such as the Student Success data dropout prevention model. Leading educational institutions already use it on a daily basis and it is in pilot phase in some education departments.

The company's goal is to continue working to build an open cloud hand in hand with our local partners and public institutions in Spain and across Europe, creating a secure European digital data ecosystem with the best technology.

calendar icon
Entrevista

Open data is not only a matter of public administrations, more and more companies are also betting on them. This is the case of Microsoft, who has provided access to selected open data in Azure designed for the training of Machine Learning models. He also collaborates in the development of multiple projects in order to promote open data. In Spain, it has collaborated in the development of the platform HealthData 29, intended for the publication of open data to promote medical research.

We have interviewed Belén Gancedo, Director of Education at Microsoft Ibérica and member of the jury in the III edition of the Aporta Challenge,focused on the value of data for the education sector. We met with her to talk about the importance of digital education and innovative data-driven solutions, as well as the importance of open data in the business sector.

Complete interview:

1. What challenges in the education sector, to which it is urgent to respond, has the pandemic in Spain revealed?

Technology has become an essential element in the new way of learning and teaching. During the last months, marked by the pandemic, we have seen how a hybrid education model - face-to-face and remotely - has changed in a very short time. We have seen examples of centers that, in record time, in less than 2 weeks, have had to accelerate the digitization plans they already had in mind.

Technology has gone from being a temporary lifeline, enabling classes to be taught in the worst stage of the pandemic, to becoming a fully integrated part of the teaching methodology of many schools. According to a recent YouGov survey commissioned by Microsoft, 71% of elementary and middle school educators say that technology has helped them improve their methodology and improved their ability to teach. In addition, 82% of teachers report that the pace at which technology has driven innovation in teaching and learning has accelerated in the past year.

Before this pandemic, in some way, those of us who had been dedicating ourselves to education were the ones who defended the need to digitally transform the sector and the benefits that technology brought to it. However, the experience has served to make everyone aware of the benefits of the application of technology in the educational environment. In that sense, there has been an enormous advance. We have seen a huge increase in the use of our Teams tool, which is already used by more than 200 million students, teachers, and education staff around the world.

The biggest challenges, then, currently, are to not only take advantage of data and Artificial Intelligence to provide more personalized experiences and operate with greater agility, but also the integration of technology with pedagogy, which will allow more flexible, attractive learning experiences and inclusive. Students are increasingly diverse, and so are their expectations about the role of college education in their journey to employment.

The biggest challenges, then, currently, are to not only take advantage of data and Artificial Intelligence to provide more personalized experiences and operate with greater agility, but also the integration of technology with pedagogy, which will allow more flexible, attractive learning experiences and inclusive.

2. How can open data help drive these improvements? What technologies need to be implemented to drive improvements in the efficiency and effectiveness of the learning system?

Data is in all aspects of our lives. Although it may not be related to the mathematics or algorithm that governs predictive analytics, its impact can be seen in education by detecting learning difficulties before it is too late. This can help teachers and institutions gain a greater understanding of their students and information on how to help solve their problems.

Predictive analytics platforms and Artificial Intelligence technology have already been used with very positive results by different industries to understand user behavior and improve decision-making. With the right data, the same can be applied in classrooms. On the one hand, it helps to personalize and drive better learning outcomes, to create inclusive and personalized learning experiences, so that each student is empowered to succeed. If its implementation is correct, it allows a better and greater monitoring of the needs of the student, who becomes the center of learning and who will enjoy permanent support.

At Microsoft we want to be the ideal travel companion for the digital transformation of the education sector. We offer educational entities the best solutions -cloud and hardware- to prepare students for their professional future, in a complete environment of collaboration and communication for the classroom, both in face-to-face and online models. Solutions like Office 365 Education and the Surface device are designed precisely to drive collaboration both inside and outside the classroom. The educational version of Microsoft Teams makes a virtual classroom possible. It is a free tool for schools and universities that integrates conversations, video calls, content, assignments and applications in one place, allowing teachers to create learning environments that are lively and accessible from mobile devices,

And, in addition, we make available to schools, teachers and students devices specifically designed for the educational environment, such as the Surface Go 2, expressly designed for the educational environment. It is an evolutionary device, that is, it adapts to any educational stage and boosts the creativity of students thanks to its power, versatility and safety. This device allows the mobility of both teachers and students inside and outside the classroom; connectivity with other peripheral devices (printers, cameras ...); and includes the Microsoft Classroom Pen for natural writing and drawing in digital ink.

3. There is increasing demand for digital skills and competencies related to data. In this sense, the National Plan for Digital Skills, which includes the digitization of education and the development of digital skills for learning. What changes should be made in educational programs in order to promote the acquisition of digital knowledge by students?

Without a doubt, one of the biggest challenges we face today is the lack of training and digital skills. According to a study carried out by Microsoft and EY, 57% of the companies surveyed expect AI to have a high or very high impact in business areas that are "totally unknown to companies today."

There is a clear opportunity for Spain to lead in Europe in digital talent, consolidating itself as one of the most attractive countries to attract and retain this talent. A recent LinkedIn study anticipates that two million technology-related jobs will be created in Spain in the next five years, not only in the technology industry, but also,and above all, in companies in other sectors of activity that seek to incorporate the necessary talent to carry out their transformation. However, there is a shortage of professionals with skills and training in digital skills. According to data from the Digital Economy and Society Index Report published annually by the European Commission, Spain is below the European average in most of the indicators that refer to the digital skills of Spanish professionals.

There is, therefore, an urgent demand to train qualified talent with digital skills, data management, AI, machine learning ... Technology-related profiles are among the most difficult to find and, in the near future, those related to technology data analytics, cloud computing and application development.

For this, adequate training is necessary, not only in the way of teaching, but also in the curricular content. Any career, not just those in the STEM field, would need to include subjects related to technology and AI, which will define the future. The use of AI reaches any field, not only technology, therefore, students of any type of career -Law, Journalism ... - to give some examples of non-STEM careers, need qualified training in technology such as AI or data science, since they will have to apply it in their professional future.

We must bet on public-private collaborations and involve the technology industry, public administrations, the educational community, adapting the curricular contents of the University to the labor reality- and third sector entities, with the aim of promoting employability and professional recycling. In this way, the training of professionals in areas such as quantum computing, Artificial Intelligence, or data analytics and we can aspire to digital leadership.

In the next five years, two million technology-related jobs will be created in Spain, not only in the technology industry, but also, and above all, in companies in other sectors of activity that seek to incorporate the necessary talent to lead carry out your transformation.

4. Even today we find a disparity between the number of men and women who choose professional branches related to technology. What is needed to promote the role of women in technology?

According to the National Observatory of Telecommunications and Information Society -ONTSI- (July 2020), the digital gender gap has been progressively reduced in Spain, going from 8.1 to 1 point, although women maintain an unfavorable position in digital skills and Internet use. In advanced skills, such as programming, the gap in Spain is 6.8 points, the EU average being 8 points. The percentage of researchers in the ICT services sector drops to 23.4%. And in terms of the percentage of graduates in STEM, Spain ranks 12th within the EU, with a difference between the sexes of 17 points.

Without a doubt, there is still a long way to go. One of the main barriers that women face in the technology sector and when it comes to entrepreneurship are stereotypes and cultural tradition. The masculinized environment of technical careers and stereotypes about those who are dedicated to technology make them unattractive careers for women.

Digitization is boosting the economy and promoting business competitiveness,as well as generating an increase in the creation of specialized employment. Perhaps the most interesting thing about the impact of digitization on the labor market is that these new jobs are not only being created in the technology industry, but also in companies from all sectors, which need to incorporate specialized talent and digital skills.

Therefore, there is an urgent demand to train qualified talent with digital capabilities and this talent must be diverse. The woman cannot be left behind. It is time to tackle gender inequality, and alert everyone to this enormous opportunity, regardless of their gender. STEM careers are an ideal future option for anyone, regardless of gender.

Forfavor the female presence in the technology sector, in favor of a digital era without exclusion, at Microsoft we have launched different initiatives that seek to banish stereotypes and encourage girls and young people to take an interest in science and technology and make them see that they they can also be the protagonists of the digital society. In addition to the WONNOW Awards that we convened with CaixaBank, we also participate and collaborate in many initiatives, such as the Ada Byron Awards together with the University of Deusto, to help give visibility to the work of women in the STEM field, so that they are references of those who They are about to come.

The digital gender gap has been progressively reduced in Spain, going from 8.1 to 1 point, although women maintain an unfavorable position in digital skills and Internet use. In advanced skills, such as programming, the gap in Spain is 6.8 points, the EU average being 8 points.

5. How can initiatives like hackathons, challenge or challenges help drive data-driven innovation? How was your experience in the III Aporta Challenge?

These types of initiatives are key to that much-needed change. At Microsoft we are constantly organizing hackathons on a global, regional and local scale, to innovate in different priority areas for the company, such as education.

But we go further. We also use these tools in class. One of Microsoft's bets is the projects STEM hacking.These are projects in which the “maker” concept of learning by doing with programming and robotics is mixed, through the use of everyday materials. What's more,They are made up of activities that allow teachers to guide their students to construct and create scientific instruments and project-based tools to visualize data through science, technology, engineering, and mathematics. Our projects -both Hacking STEM as well as coding and computational language through the use of free tools such as Make Code- aim to bring programming and robotics to any subject in a transversal way, and why not, learn programming in a Latin class or in a biology one.

My experience in the III Aporta Challenge has been fantastic because it has allowed me to learn about incredible ideas and projects where the usefulness of the amount of data available becomes a reality and is put at the service of improving the education of all. There has been a lot of participation and, in addition, with very careful and worked presentations. The truth is that I would like to take this opportunity to thank everyone who has participated and also congratulate the winners.

6. A year ago, Microsoft launched a campaign to promote open data in order to close the gap between countries and companies that have the necessary data to innovate and those that do not. What has the project consisted of? What progress has been made?

Microsoft's global initiative Open Data Campaign seeks to help close the growing “data gap” between the small number of technology companies that benefit most from the data economy today and other organizations that are hampered by lack of access to data or lack of capabilities to use the ones you already have.

Microsoft believes that more needs to be done to help organizations share and collaborate around data so that businesses and governments can use it to meet the challenges they face, as the ability to share data has huge benefits. And not only for the business environment, but they also play a critical role in helping us understand and address major challenges, such as climate change, or health crises, such as the COVID-19 pandemic. To take full advantage of them, it is necessary to develop the ability to share them in a safe and reliable way, and to allow them to be used effectively.

Within the Open Data Campaign initiative, Microsoft has announced 5 great principles that will guide how the company itself approaches how to share its data with others:

  • Open- Will work to make relevant data on large social issues as open as possible.
  • Usable- Invest in creating new technologies and tools, governance mechanisms and policies so that data can be used by everyone.
  • Boosters- Microsoft will help organizations generate value from their data and develop AI talent to use it effectively.
  • Insurance- Microsoft will employ security controls to ensure data collaboration is secure at the operational level.
  • Private- Microsoft will help organizations protect the privacy of individuals in data-sharing collaborations that involve personally identifiable information.

We continue to make progress in this regard. Last year, Microsoft Spain, next to Foundation 29, the Chair on Privacy and Digital Transformation Microsoft-Universitat de València and with the legal advice of the law firm J&A Garrigues have created the Guide "Health Data"that describes the technical and legal framework to carry out the creation of a public repository of health systems data, and that these can be shared and used in research environments and LaLiga is one of the entities that has shared, in June of this year, its anonymized data.

Data is the beginning of everything and one of our biggest responsibilities as a technology company is to help conserve the ecosystem on a large scale, on a planetary level. For this, the greatest challenge is to consolidate not only all the available data, but the artificial intelligence algorithms that allow access to it and allow making decisions, creating predictive models, scenarios with updated information from multiple sources. For this reason, Microsoft launched the concept of Planetary Computer, based on Open Data, to make more than 10 Petabytes of data - and growing - available to scientists, biologists, startups and companies, free of charge, from multiple sources (biodiversity, electrification , forestry, biomass, satellite), APIs, Development Environments and applications (predictive model, etc.) to create a greater impact for the planet.

Microsoft's global initiative Open Data Campaign seeks to help close the growing “data gap” between the small number of technology companies that benefit most from the data economy today and other organizations that are hampered by lack of access to data or lack of capabilities to use the ones you already have.

7. They also offer some open data sets through their Azure Open Datasets initiative. What kind of data do they offer? How can users use them?

This initiative seeks that companies improve the accuracy of the predictions of their Machine Learning models and reduce the time of data preparation, thanks to selected data sets of public access, ready to use and easily accessible from the Azure services.

There is data of all kinds: health and genomics, transport, labor and economy, population and security, common data ... that can be used in multiple ways. And it is also possible to contribute datasets to the community.

8. Which are the Microsoft's future plans for open data?

After a year with the Opendata campaign, we have had many learnings and, in collaboration with our partners, we are going to focus next year on practical aspects that make the process of data sharing easier. We just started publishing materials for organizations to see the nuts and bolts of how to start sharing data. We will continue to identify possible collaborations to solve social challenges on issues of sustainability, health, equity and inclusion. We also want to connect those who are working with data or want to explore that realm with the opportunities offered by the Microsoft Certifications in Data and Artificial Intelligence. And, above all, this issue requires a good regulatory framework and, for this, it is necessary that those who define the policies meet with the industry.

calendar icon
Blog

Artificial intelligence is increasingly present in our lives. However, its presence is increasingly subtle and unnoticed. As a technology matures and permeates society, it becomes more and more transparent, until it becomes completely naturalized. Artificial intelligence is rapidly going down this path, and today, we tell you about it with a new example.

Introduction

In this communication and dissemination space we have often talked about artificial intelligence (AI) and its practical applications. On other occasions, we have communicated monographic reports and articles on specific applications of AI in real life. It is clear that this is a highly topical subject with great repercussions in the technology sector, and that is why we continue to focus on our informative work in this field.

On this occasion, we talk about the latest advances in artificial intelligence applied to the field of natural language processing. In early 2020 we published a report in which we cited the work of Paul Daugherty and James Wilson - Human + Machine - to explain the three states in which AI collaborates with human capabilities. Daugherty and Wilson explain these three states of collaboration between machines (AI) and humans as follows (see Figure 1). In the first state, AI is trained with genuinely human characteristics such as leadership, creativity and value judgments. In the opposite state, characteristics where machines demonstrate better performance than humans are highlighted. We are talking about repetitive, precise and continuous activities. However, the most interesting state is the intermediate one. In this state, the authors identify activities or characteristics in which humans and machines perform hybrid activities, in which they complement each other. In this intermediate state, in turn, two stages of maturity are distinguished.

  • In the first stage - the most immature - humans complement machines. We have numerous examples of this stage today. Humans teach machines to drive (autonomous cars) or to understand our language (natural language processing).
  • The second stage of maturity occurs when AI empowers or amplifies our human capabilities. In the words of Daugherty and Wilson, AI gives us humans superpowers.

Collaborative states between humans and machines. Human-only states are: leadership, creativity, value judgments. Machine-human hybrid activities include those that we humans teach machines (driving, reading, speaking) and those where machines give us superpowers (seeing better, reading more, speaking in other languages). Finally, we have the AI-only stages, for repetition, precision and continuous effort tasks, known as Dull-Dirty-Dangerous (DDD).

Figure 1: States of human-machine collaboration. Original source.

In this post, we show you an example of this superpower returned by AI. The superpower of summarizing books from tens of thousands of words to just a few hundred. The resulting summaries are similar to how a human would do it with the difference that the AI does it in a few seconds. Specifically, we are talking about the latest advances published by the company OpenAI, dedicated to research in artificial intelligence systems.

Summarizing books as a human

OpenAI similarly defines Daugherty and Wilson's reasoning on models of AI collaboration with humans. The authors of the latest OpenAI paper explain that, in order to implement such powerful AI models that solve global and genuinely human problems, we must ensure that AI models act in alignment with human intentions. In fact, this challenge is known as the alignment problem.

The authors explain that: To test scalable alignment techniques, we train a model to summarize entire books [...] Our model works by first summarizing small sections of a book, then summarizing those summaries into a higher-level summary, and so on.

Let's look at an example.

The authors have refined the GPT-3 algorithm to summarize entire books based on an approach known as recursive task decomposition accompanied by reinforcement from human comments. The technique is called recursive decomposition because it is based on making multiple summaries of the complete work (for example, a summary for each chapter or section) and, in subsequent iterations, making, in turn, summaries of the previous summaries, each time with a smaller number of words. The following figure explains the process more visually.

omeo and Juliet by William Shakespeare, has 25,433 words. The AI first makes 72 summaries of 5809 words each. Then it elaborates 7 summaries of 5809 words each and finally, a final summary of 119 words.

Fuente original: https://openai.com/blog/summarizing-books/

Final result:

Example of the final summary in English with 119 words

Original source: https://openai.com/blog/summarizing-books/

As we have mentioned before, the GPT-3 algorithm has been trained thanks to the set of books digitized under the umbrella of Project Gutenberg. The vast Project Gutenberg repository includes up to 60,000 books in digital format that are currently in the public domain in the United States. Just as Project Gutenberg has been used to train GPT-3 in English, other open data repositories could have been used to train the algorithm in other languages. In our country, the National Library has an open data portal to exploit the available catalog of works under public domain in Spanish.

The authors of the paper state that recursive decomposition has certain advantages over more comprehensive approaches that try to summarize the book in a single step.

  1. The evaluation of the quality of human summaries is easier when it comes to evaluating summaries of specific parts of a book than when it comes to the entire work.
  2. A summary always tries to identify the key parts of a book or a chapter of a book, keeping the fundamental data and discarding those that do not contribute to the understanding of the content. Evaluating this process to understand if those fundamental details have really been captured is much easier with this approach based on the decomposition of the text into smaller units.
  3. This decompositional approach mitigates the limitations that may exist when the works to be summarized are very large.

In addition to the main example we have exposed in this post on Shakespeare's Romeo and Juliet, readers can experience for themselves how this AI works in the openAI summary browser. This website makes available two open repositories of books (classic works) on which one can experience the summarization capabilities of this AI by navigating from the final summary of the book to the previous summaries in the recursive decomposition process.

In conclusion, natural language processing is a key human capability that is being dramatically enhanced by the development of AI in recent years. It is not only OpenAI that is making major contributions in this field. Other technology giants, such as Microsoft and NVIDIA, are also making great strides as evidenced by the latest announcement from these two companies and their new Megatron-Turing NLG model.  This new model shows great advances in tasks such as: the generation of predictive text or the understanding of human language for the interpretation of voice commands in personal assistants. With all this, there is no doubt that we will see machines doing incredible things in the coming years.


Content prepared by Alejandro Alija, expert in Digital Transformation and Innovation.

The contents and views expressed in this publication are the sole responsibility of the author.

calendar icon