Taxonomies and Thesauri: Knowledge Organization Tools

Fecha de la noticia: 06-10-2017

datos abiertos, ontología, tesauros,

In order to organize (classify, describe, index) knowledge, several knowledge organization tools exist. Below is a summary of them, organized from the simplest (the least formalized and with fewer rules) to the more complex (more formalized and with more rules):

  1. Controlled vocabularies
  2. Taxonomies
  3. Thesauri
  4. Ontologies

A controlled vocabulary is a simple list of the terms that are assigned a specific meaning, they are arranged a priori and are used to describe knowledge.

For example, the provinces of the Spanish territory – (Asturias, Illes Balears, Valladolid) with which any document can be labelled and which are described in Annex V of Document BOE-A-2013-2380 on “Technical Interoperability Standard for the Reuse of Information Resources”.

A taxonomy is a controlled vocabulary, where the terms are organized in a hierarchical way (with a tree structure), from the most general to the most specific terms, including those that are related.

An example would be the taxonomy of primary sectors and the areas related to each one (for example, primary sector “Environment” that includes the areas “Meteorology”, “Geography”, “Conservation of fauna and flora”). The defined primary sectors are used to describe a set of data in the data catalog of datos.gob.es. This taxonomy is defined in Annex IV of Document BOE-A-2013-2380 on “Technical Interoperability Standard for the Reuse of Information Resources”.

A thesaurus is a taxonomy with some “extra” relationships:

  • Synchronicity or preference relationships: among the preferred term (PT) or descriptor and the non-preferred term (NPT).
  • Whole-part or class-subclass hierarchical relationships: that is, among the broader terms (BT) and the narrower terms (NT).
  • Associative relationships: among related terms (RT) in a pragmatic way, that is, not in a hierarchical or synonymous way.

AGROVOC is a thesaurus that organizes concepts related to the fields of interest of FAO, such as agriculture (mainly), food, nutrition, fisheries, forestry or the environment. For example, the concept “Fish farms” is within a broader concept “Land” (whose synonym is “farm”, which also has other relationships).

An ontology, the next step in the organization of knowledge, is the formal definition of types, properties and relationships among concepts in a particular domain of discussion. And we understand a formal decision as that which encodes knowledge based on formal logic, as a collection of assertions (thus being able to be processed by a machine that would be in a position to make inference of new knowledge).

An example of ontology would be FOAF (Friend Of A Friend), which serves to describe people, their activities and their relationships with other people and objects (“Ana meets Águeda”, “Ana’s email is <ana@example.org>” …)

There is no golden rule for selecting which tool to use, instead in each case we have to choose the one whose level of complexity is most appropriate, always trying to choose the simplest option, following the KISS principle (Keep It Simple, Stupid!) that tells us that simplicity should be a key design objective, and avoiding any unnecessary complexity. It is worth remembering that the first step before creating a new knowledge organization tool is to search and see if there is already one that is available for the proposed task and that can be reused.