The Juan March Foundation, a pioneer in the use of data science applied to culture

Fecha de la noticia: 19-10-2018

Fundación Juan March

Data science is reaching all sectors, from agriculture to health, tourism or transport. And culture, as illustrated by the DataLab of Juan March Foundation, a knowledge laboratory that not only seeks to manage and disseminate the contents from the Library, but also extracts valuable information that can be used to optimize processes, take better decisions or create new services.

This DataLab was created in 2013, collecting the experiences learned in the Data Library of the former Center for Advanced Studies in Social Sciences (CEACS) of Juan March Institute. The DataLab was born inside the Juan March Foundation Library in order to lead the organization of digital knowledge produced by the Library itself and the rest of the Foundation's areas, dealing with the structure, curation and analytics of digital data.

The challenge was big. The Foundation had been created in 1955, and, since then, it had accumulated a valuable collection composed of contents with different formats: videos, images, audios, texts... in addition to the large amount of knowledge organized and conserved for internal purposes in the Foundation's departments.

In order to face this challenge, a multidisciplinary team was chosen, integrating librarians, technologists and mathematicians. Using technologies and methodologies specific to data science, they were able to explore new ways of analyzing and visualizing information.

Specifically, the DataLab has 4 areas of action:

  • Data curation: One of the main tasks of this DataLab is the management and continuous maintenance of the digital repositories property of Juan March Foundation. Their responsibilities include the classification, documentation, storage, integration and digital preservation of data. Currently these data are disseminated through thematic knowledge portals (for example, focused on contemporary Spanish music or Spanish musical theater), visited by an average of 10,000 users per month, including researchers and specialists in social sciences and humanities that look for an inspiration source for new projects.
  • Analytics: One of the new features of this DataLab is its commitment to analytics applied to cultural data. On the one hand, they apply analytical technologies to data curation, which results in automated processes that facilitate the classification of information. On the other hand, they also make use of data analytically. In other words, the DataLab functions as a transversal service that provides business intelligence to other Foundation's areas, through the creation of scorecards and the resolution of specific requests for information and analysis.
  • Infrastructure: To implement a project of this magnitude, it was necessary to create an entire technological infrastructure that allow capturing data from different sources, organizing and structuring to take advantage using different tools and processes. Therefore, they are in a continuous process of development to redifine data capture, normalization, analysis and enrichment. All this is carried out using specialized Big Data environments in the cloud.
  • Innovation. One of the fundamental pillars of this DataLab is the experimentation with new technologies that can provide an additional value layer to data. For example, data capture processes are enriched thanks to Artificial Intelligence tools, which carry out from sentiment analysis of social media to content transcriptions and automatic classifications.

All this work has resulted in a project in continuous evolution and growth, which makes valuable information available to users, demonstrating the potential of data science applied to culture.