Google backs open data

Fecha de la noticia: 30-10-2019

Google data search

The tech giant Google has recently expressed its interest in open source and open data. Under the dogma that open data is "good not only for us and our industry, but also benefit the world at large", they say they are committed to sharing data, services and software with citizens.

This policy has led Google to open data sets and make them accessible through APIs or tools that facilitate their use by individuals and organizations.

Data opening

Currently, Google has made more than 60 sets of standardized and machine-readable data available to users - intended to be used by machine learning systems. These data sets are accompanied by supporting materials aimed at developers and researchers who are interested in working with image collections, corpus of annotated videos, high granularity data, etc. An example: Facets, which help analyze the composition of a data set and evaluate what are the best ways to use it.

From Google they are also working to improve quality and create more representative data sets through interfaces such as Crowdsourcing, an application that take advantage of the user communities´ work. With this application, users can check labels, perform and validate translations or help improve feelings analysis systems.

Location and analysis of open data

But opening the data is not enough, they also have to be easy to find. In this sense, Google offers Google Dataset Search, a search engine that facilitates the location of open data in hundreds of repositories associated with international institutions, such as the World Bank or the European data portal, as well as in official catalogues associated with governments worldwide. Of course, it is necessary that the data be described in such a way that search engines can locate them.

In order to help analyze and extract value from this data, users have at their disposal Data Commons, a graph of knowledge of data sources that allows researchers and students to process several data sets at once, regardless of the source and format, as if all were in just one local database.

As additional and necessary complements, Google also participates in the revitalization of communities of data scientists (Kaggle), offers training courses on the subject, launches challenges aimed at energizing the community in its use of data for the resolution of issues previously raised, and continuously launch campaigns aimed at making the availability of this new resource increasingly abundant.

Google's commitment to open data also requires a strategy that takes into account key aspects such as confidentiality and privacy. From Google, they claim to have different mechanisms to guarantee these issues, such as Federated Learning, a technique to train global Machine Learning models without the data leaving a person's device.