Wikidata, a free and open knowledge database

Fecha de la noticia: 23-02-2021

Wikidata

There was a time, not so long ago, when a discrepancy in a piece of information could lead to a long conversation between friends and family. Now, those discussions are over with a quick internet search: how old a celebrity is, which football team won a particular competition or who was in power at a particular moment in history. It's all online.

When searching for the answer to any of these questions on the internet, the first result that usually comes up is Wikipedia, the famous free online encyclopaedia. Wikipedia is a project of the Wikimedia Foundation, a non-profit organisation whose aim is to bring knowledge to everyone on the planet. Wikipedia is the best-known project, but Wikimedia offers many other services such as Wikiversity, which supports learning communities by providing teaching materials and activities, or Wiktionary, a free dictionary.

Going back to Wikipedia, this encyclopaedia offers us pieces of information created and edited by volunteers from all over the world based on three principles: neutral point of view, verifiability and no original research. If you enter an article by a random organisation or person, you will see a table on the right-hand side with a series of data. At the beginning, this data had to be uploaded manually for each entry, and translated into each language. Considering that it contains more than 50 million articles in more than 300 languages, the manual task was complicated to say the least. This changed with the arrival of Wikidata.

 

Example of Wikipedia and Wikidata entry

What is wikidata?

Wikidata is a collaborative, free and open knowledge base that stores structured information. Its main advantage is that it offers linked data, described using RDF, which allows data to be linked to other datasets in other digital repositories.

Wikidata can be read and edited by both humans and machines, integrating data sources published under Creative Commons-compatible public domain (CC-0) licences. Therefore, all content can be reused by any person or company that wishes to do so.

Wikidata takes advantage of the Semantic Web to create a linked database of universal knowledge. When editing Wikipedia or any other Wikimedia wiki, the user can load the data dynamically from Wikidata. In this way, statistics, dates or locations can be managed centrally. When changing the data in wikidata, it will be automatically updated in all articles where it appears and in all languages.

Wikidata should not be confused with DBpedia. DBPedia is basically a repository of data extracted from Wikipedia and relations between them in the form of RDF triples that resides in the Linked Open Data cloud. So, while DBPedia extracts data from Wikipedia infoboxes, Wikidata provides structured data references that anyone can edit. In other words, it turns DBpedia's extraction process on its head: instead of extracting structured data from infoboxes, it will allow infoboxes to be created from structured data. In this sense, the two projects can be seen as complementary.

How does Wikidata work?

The Wikidata repository consists mainly of items or articles, each of which has a label and a description. Items are uniquely identified by a Q followed by a number, e.g. Biblioteca Nacional de España (Q750403). Each item is described by a series of properties (identified by a P followed by a number). A declaration is a triple of an item ID (Q), a property ID (P) and a value. For example, Miguel de Cervantes (Q5682) is the author (P2093) of "El Ingenioso Hidalgo...", the value, in this case, could be another item (Q)".

Tutorials are available for users to create new items and statements. There are many considerations when contributing data, files or other resources to Wikidata projects, some of which can be seen in this table.

How can the data be accessed?

The data can be accessed through referenceable URIs following the linked data standards, or through the MediaWiki API.

Wikidata also provides a SPARQL endpoint which is also available via a Web-GUI interface. This allows the user to extract any type of data, with a query to the semantic data. A tutorial and a query helper are available for users who need them.

Wikidata Sparql Endpoint can be used not only to retrieve data, but also to represent maps, diagrams, timelines, lists and other convenient visualisations.

A practical case of exploitation of Wikidata data can be found in the Monumental application, created to browse and consult information on historical heritage and real estate. Another notable example is the Wikicite initiative to create a database of scientific bibliographies based on Wikidata information, with applications such as Scholia.

A successful project sustainable over time

Wikidata has shown great potential since its inception. It was one of the causes behind the closure of Freebase, the free platform of the giant Google that stored databases related to more than 46 million topics. Before its closure, users had the opportunity to export their data to Wikidata, which emerged as the big winner in the duel.

Wikidata offers numerous advantages, not only for editing and updating Wikipedia automatically, but also for all data professionals who have a vast knowledge base at their disposal.

February this year saw the 20th anniversary of Wikipedia. Wikidata is already nine years old, but it is sure to reach much more, and to become more and more important in its own right.


Content prepared by the datos.gob.es team based on the information shared by Ismael Olea