AGROVOC: Thesaurus for the classification of agricultural knowledge

Fecha del post: 13-09-2017

AGROVOC, datos enlazados, open data

One of the major concerns of international organisations is the dispersion in the categorisation and indexation of the knowledge created in each of the component parts (such as agencies, units, departments, venues or programmes). This is because, for example, when a person in a department wants to search for content on a particular topic in all the knowledge repositories of each of the parts of the organisation, they usually have to search in each individual repository. To do this, they use the keywords defined for each repository (which may vary), instead of using the same keywords, or even performing a single search (if there is a search system where all repositories are integrated). To solve this issue and facilitate the exchange of information between each of its parties, it is customary to create controlled vocabulary lists for the classification of the content generated.

The Food and Agriculture Organisation of the United Nations (FAO) was in this situation, and so in the early 1980s it created AGROVOC.

AGROVOC is a thesaurus that organises concepts related to the FAO's areas of interest, such as agriculture (mainly), food, nutrition, fisheries, forestry or the environment.

This thesaurus (and therefore, controlled vocabulary) which, as of August 2017, is currently composed of more than 33,000 terms, has followed an evolutionary process, going from being in 3 languages ​​to 23 (Spanish being one of the first three), from being available only in a printed format to now being available as a SKOS-XL conceptual scheme.

In addition, AGROVOC is available as a Linked Open Data (LOD) set aligned with 16 other multilingual agricultural knowledge-management systems (such as DBPedia, EUROVOC, the U.S. thesaurus from the National Agricultural Library (NAL), the thesaurus from the International Centre for Agriculture and Biotechnology (ICAB)).

In fact, multiple organisations are working on projects, such as Global Agricultural Concept Scheme (GACS), which explores the possibility of creating a thesaurus of agricultural concepts and terminology, reusing the thesauri from the AGROVOC, ICAB and NAL.

The management of AGROVOC is divided into several parts. On the one hand, the FAO is responsible for its publication and final revision, while on the other hand, a community of external organisations and experts from different areas of knowledge are responsible for editing it (proposing new concepts, extending the terminology of the concepts already existing in other languages, reviewing and preserving the terminology already created...). To carry out this editing work, the community uses VocBench, an open source vocabulary management tool.

Undoubtedly, one of the key characteristics that has facilitated the extension of the use of AGROVOC in the community is that its access and use is free, being distributed under a Creative Commons 3.0 Attribution (CC-BY) license.

AGROVOC is commonly used by research personnel, librarians and knowledge managers for the indexing, retrieval and organisation of data in agricultural information systems and the FOA's other areas of interest mentioned above. As examples of dogfooding (when an organisation uses its own product to test and promote it) is the use of AGROVOC in AGRIS, the bibliographic information database on agricultural science and technology, or FAOLEX, a database of national legislation and policies on food, agriculture and natural resource management, both managed by the FAO.

The ways in which AGROVOC can be accessed, searched and reused are varied

  • You can search for concepts or browse by hierarchy
  • It can to download as an RDF dataset in two versions: Agrovoc Core (which includes all concepts in all languages ​​but no links to external vocabulary lists) and Agrovoc LOD (which does include links to external vocabulary lists)
  • Available web services can be used
  • Searches can be performed through SPARQL queries, using a public SPARQL endpoint

The data on agriculture that are being generated are growing exponentially, by recording environmental data using sensors, by compiling legal regulations, or by using economic data on prices, crop production data, disease and pest data, etc... All this knowledge, if carefully catalogued, can be a source of future studies and discoveries, both in the public and private spheres. In this context, AGROVOC (and in the future, perhaps GACS) is a valuable tool for data to be classified homogeneously, facilitating the interoperability and reuse of the data, both within and beyond that organisation.