Interview with the team in charge of the DataLab of Juan March Foundation
Fecha: 13-12-2018
Nombre: Paz Fernández
Sector: Culture and leisure
Organismo, Institución o Empresa: Biblioteca Fundación Juan March
País: Spain

In 2013, the Juan March Foundation decided to create a digital knowledge laboratory to centralize data curation and analytics projects carried out by the organization. This is how the Datalab of the Juan March Foundation emerged, a dynamic and integrating space that highlights the power of data science applied to culture, driving its dissemination.
Datos.gob.es have spoken with Paz Fernández, director of the Foundation Library, to tell us her experience launching an initiative of this kind.
The Datalab of the Juan March Foundation has been a pioneering project in our country. How was the creation process? What steps did you follow to develop your Datalab?
Libraries are living organs and, without losing their identity, they must adapt to the times and community for which they work. Sometimes, changes are not chosen, but managers always has to be prepared to respond to new situations.
The process of creating the DataLab began in the fall of 2013. It has its origins in the transformation of the Center for Advanced Studies in Social Sciences (CEACS 1987-2013) of the Juan March Institute, into the Carlos III Institute - Juan March Social Science, and the donation of the excellent CEACS library to the Carlos III University of Madrid.
This step led to the restructuring of the mission of the Juan March Foundation Library, which, continuing with its high humanistic specialization, incorporated all the knowledge related to raw data use that had been applied by the CEAC library staff for years. That human and intellectual capital - I would like to highlight the vision of having had the only data librarian working in a Spanish library in those years -, was first timidly withdrawn, suggesting the creation of a core strategy based on the organization of digital knowledge from the entire institution led by the Library. So that, the library added to its functions becoming a support center for research that evolve to a powerful section of the Library. Nowadays, this section is completely established in the organization as a DataLab. In short, and paraphrasing Ortega y Gasset, "we put our heels in the past to take off, and one foot after another forward, start, walk and move forward".
Were the library based on any other type of initiative? What were your sources of inspiration?
In the first place, we base ourselves on the conviction of the need to incorporate quantitative methodology, that is, measurable, to human sciences and products derived from the culture world, in the same way that statistical methods were used among social scientists. In addition, in 2010 approximately, the first articles on digital humanities begin to be published and there were pioneering digital projects very useful for education (Digital scholarship), carried out in interdisciplinary experimentation environments built in digital laboratories such as the Digital Library Lab of Harvard University or the British Library Labs, surpassing the previous stage characterized by the explosion of digital collections (digital repositories), of higher or lower quality.
In 2013, moreover, it is when organizations not only starts talking about Big Data but also about Maching learning as an appropriate process for digital curation, and the concept Data Science is incorporated to data management in fundamental studies published in 2013 such as Bad data handbook, edited by Q. Ethan McCallum and Doing data science by Cathy O'Neil and Rachel Schutt,.
In the DataLab, we detected the need to seek alliances to strengthen our knowledge in statistical computing and mathematical models of machine learning. One of the possible alliances, without a doubt of success, was the collaboration with the Faculty of Mathematics of the Complutense University, co-directing projects from the Master's Degree in Computational Statistics of Information.
After almost 60 years of activity, the Foundation had a big amount of data, not only in the Library, but also in different areas of the organization. How did you meet this challenge? What other barriers did you found and how were they overcome?
The activity of the Juan March Foundation carried out over more than 60 years has given rise to a unique knowledge, mostly converted to digital, of immense cultural, artistic and social value. This collection of heterogeneous materials contains publications, concerts, lectures, exhibitions, photographic archive, as well as the legacies and the bibliographic or sound collections of its library that have been registered for various uses in multiple referential databases.
The proposal presented by the Library to the Management area in 2013 consisted in the creation of a core strategy that will enrich and integrate data in collaboration with the rest of the departments to address the management of digital information in a coordinated and uniform way. The biggest challenge was to open the doors of the Library to turn it into a transversal, horizontal and interdepartmental service, convincing both the rest of the departments and the Library team of their mutual advantages.
It's been five years. The barriers have been overcome with effort, study, commitment, generosity, a lot of pedagogy!, and effective results that have demonstrated the advantages of introducing data use and reuse culture in an operational and dynamic organization.
The datalab applies new technologies and methodologies from data science to cultural data. What are the advantages that these technologies can bring to the cultural sector?
Without being experts in many of the things we do, we believe that it is important that an organization like ours have spaces to innovate. Technology and data are a unique opportunity to do things from another point of view that can add value. DataLab, like any other laboratory, is a space for experimentation and it is essential to assume that, in order to innovate, it is necessary to rehearse previously. It implies to assume that it can fail and that it is not a lost time, because without doubt, we learn a lot by analyzing and solving mistakes.
We are convinced that all are advantages and processes that have come to stay, so, sooner rather than later, all organizations that are concerned about their mission excellence will have to work with their data. Measurement implies designing a knowledge environment and a technological infrastructure to capture, clean, store, link and analyse data in order to draw conclusions and recommendations to know the organization as a whole and make decisions to continue improving.
We believe that in this context, libraries have a leading role. The data is information and we are experts in preserving, describing, organizing, analyzing, reusing, disseminating and preserving information, either the one that reaches us or the one we produce.
Can you tell us some projects you are currently involved in?
We try to cover the four big sections that make up our DataLab and that feed each other: curation, analytics, infrastructure and innovation.
In these moment, in curation area, we are working on the preservation project of every digital file produced by the Foundation and the Library. It is a huge project, in which you have to work with digital objects that are scattered and difficult to identify, as well as with digital objects created every day, especially in audiovisual format, requiring coordinated processes with other departments (Multimedia and Systems) and a sophisticated technological infrastructure for (metadata) description and retrieval, and storage, security, auditing and monitoring.
We also works to enrich the technological infrastructure that gathers, explores and extracts data in an integrated manner from multiple containers (data layer), making possible, among others, dynamic and intelligible dashboards, or predictive analysis models using business intelligence.
In innovation, we are working on issues related to artificial intelligence and complex statistical models. The idea here is to test technologies and methodologies that allow us to do different things with the data we already have. We test both storage and management tools as well as analysis and visualization tools. The work methodologies are iterative and agile, allowing us to prototype rapidly.
In addition to disseminating the cultural collections of the Library, you also analyze and reuse data, enriched with internal information. What benefits do you obtain from this process?
In general from the libraries, unfortunately and despite the good performance that could be made, it seemed that we worked in parallel to the institution. And that perception has been increased in recent times by the invisibility of the enormous work done by the information manager, so that quality content is accessible in the open on the web, the drop in face-to-face users, etc.
The vision we suggest to make the leap in competence is summarized in one line: the data of the organization are valuable information that the library must preserve. No doubt we complicate our day to day, but the advantages are obvious. The Library has increased its relevance in the organization, learns with it, and the organization learns from DataLab.
One of the factors that you consider key in the success of the initiative is having a multidisciplinary team, what profiles should not be missing when starting a datalab with cultural data?
DataLab is managed by the Foundation Library. The person responsible for the DataLab section is Luis Martínez-Uribe, a mathematician and data scientist, who has incorporated the knowledge and methodologies of data science into DataLab. Next to him works Fernando Martínez Guzmán, data engineer, which promotes intelligence and the relationship between the dozens of databases generated by the organization. The librarians themselves also collaborate, greatly facilitating the preparation of the collections from bibliographic resources.
In addition, DataLab works iteratively with all the researchers and cultural managers working in the departments of the Foundation (Art, Conferences, Music, and especially with Communication and Experience) answering specific questions that are presented and discussed.
To all this, it is essential to add the relationship with the investigation and with the outside world; that is to say, it is fundamental to continue innovating: the cooperation with the university and the participation in the specialized forums, the reading of the
What are the next steps you are going to follow? Do you have planned any type of action that favors the reuse of your data by third parties?
We are seriously considering sharing some of the raw databases for reuse in datos.gob.es. We already participate in the open data repository in social sciences at MIT (Harvard Dataverse) and believe that the time has come to facilitate and cooperate with other institutions by providing our cultural data or communicating the possibility of requesting them for research and study purposes.