Google as a reuse of open data
Fecha de la noticia: 10-08-2021

The bet of the technological giant Google with open data it has been evident in various initiatives carried out in recent years. On the one hand, they launched the search engine Google Dataset Search, that facilitates the location of open data published in hundreds of repositories of international institutions and governments, among which is datos.gob.es. On the other, they launched their own data opening initiative, where they offer standardized and readable data sets by machines in order to be used by machine learning systems. This last initiative is part of Google research, the portfolio of research and innovation projects from Google: from the prediction of the spread of COVID-19 to the design of algorithms, through the learning of automatic translation of a greater number of languages, among others. In these and other projects, Google has not only opted for the publication of datasets, but the company itself also acts as a reuse of public data. In this post we are going to some examples of Google solutions and projects that integrate open data into their operations.
Google Earth
Through a virtual globe based on satellite images, Google Earth allows you to view multiple cartographies. Users can explore territories in 3D and add markers or draw lines and areas, among other tools.
One of his latest updates has been the incorporation of the Timelapse function, which has involved the integration of 24 million satellite photos captured during the last 37 years (specifically, between 1984 and 2020). In this way, changes can be observed in the different regions of the planet. Among other information, the solution shows the forest changes, the urban growth or the heating of our planet, which allows us to become aware of the climate crisis we are experiencing in order to act accordingly. It is therefore a fundamental solution for environmental education, with great potential for use in classes.
Integrated data comes from the program Landsat the United States Geological Survey, and the Copernicus program and the Sentinel satellites of the European Union. Specifically, there were 20 petabytes of satellite images that have been made available to users in a single large, high-resolution video mosaic, for which more than 2 million hours of processing have been required. It should be noted that both the Copernicus and Landsat data are open to reuse by any individual or company that wishes to launch its own services and products.
Google Translator
Another of the technology giant's best-known tools is its translator, which was launched in 2006. Ten years later it was updated with the Google Neural Machine Translation System (GNMT), which uses more modern machine learning techniques for its training.
Google does not make public the exact data it uses for training the system, although in its report Google's Neural Machine Translation System: Bridging the Gap
between Human and Machine Translation They do highlight that they have performed benchmark tests with two publicly available corpus: WMT'14 English-to-French and WMT´14 English-German.
Although much progress has been made, the system still does not match the level of quality that is obtained with a translation carried out by an expert human being in the field, especially in the case of the most minority languages, so they are followed making adjustments and advances. Another area where it is also necessary to continue working on the biases of the data that they use to train the system and that can lead to stereotypes. For example, it has been found that the translator introduces biases when using masculine and feminine in the translation of phrases from neutral languages, without gender, like English or Hungarian. In these cases, the feminine is used by default for tasks related to care and beauty, and the masculine for better valued professional options. The tech giant has indicated that is already working on the resolution of this problem.
Other examples from Google Research
Within the afore mentioned Google Research, different projects are carried out, some of them closely linked to reuse. For example, in the context of the current pandemic, the mobility reports with anonymous information on displacement trends–Which can be downloaded in csv format. These reports make it possible to understand the impact of movement restriction policies, as well as to make economic forecasts. The data has also been leveraged by their own teams of data scientists to perform predictions of the spread of COVID-19 using graphical neural networks instead of traditional time series-based models.
They have also developed projects in the field of meteorological prediction, to develop estimates in increasingly specific areas (it is no longer just a question of whether it will rain in my city, but whether there will be rainfall in my area). For this, resources from NOAA (National Oceanic and Atmospheric Administration) and a new technique called HydroNets, based on a network of neural networks to model the real river systems of the world.
You can see more information about the latest advances in Google Research at this article.
All these examples show that open data is not only a source for the creation of innovative solutions for entrepreneurs and small companies, but also that large companies take advantage of their potential to develop services and products that become part of the company's portfolio.
Content prepared by the datos.gob.es team.