2020 is coming to an end and in this unusual year we are going to have to experience a different, calmer Christmas with our closest nucleus. What better way to enjoy those moments of calm than to train and improve your knowledge of data and new technologies?
Whether you are looking for a reading that will make you improve your professional profile to which to dedicate your free time on these special dates, or if you want to offer your loved ones an educational and interesting gift, from datos.gob.es we want to propose some book recommendations on data and disruptive technologies that we hope will be of interest to you. We have selected books in Spanish and English, so that you can also put your knowledge of this language into practice.
Take note because you still have time to include one in your letter to Santa Claus!
INTELIGENCIA ARTIFICIAL, naturalmente. Nuria Oliver, ONTSI, red.es (2020)
What is it about?: This book is the first of the new collection published by the ONTSI called “Pensamiento para la sociedad digital”. Its pages offer a brief journey through the history of artificial intelligence, describing its impact today and addressing the challenges it presents from various points of view.
Who is it for?: It is aimed especially at decision makers, professionals from the public and private sector, university professors and students, third sector organizations, researchers and the media, but it is also a good option for readers who want to introduce themselves and get closer to the complex world of artificial intelligence.
Artificial Intelligence: A Modern Approach, Stuart Russell
What is it about?: Interesting manual that introduces the reader to the field of Artificial Intelligence through an orderly structure and understandable writing.
Who is it for?: This textbook is a good option to use as documentation and reference in different courses and studies in Artificial Intelligence at different levels. For those who want to become experts in the field.
Situating Open Data: Global Trends in Local Contexts, Danny Lämmerhirt, Ana Brandusescu, Natalia Domagala – African Minds (October 2020)
What is it about?: This book provides several empirical accounts of open data practices, the local implementation of global initiatives, and the development of new open data ecosystems.
Who is it for?: It will be of great interest to researchers and advocates of open data and to those in or advising government administrations in the design and implementation of effective open data initiatives. You can download its PDF version through this link.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics), Trevor Hustle, Jerome Friedman. – Springer (May 2017)
What is it about?: This book describes various statistical concepts in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. While the focus is statistical, the emphasis is on definitions rather than mathematics.
Who is it for?: It is a valuable resource for statisticians and anyone interested in data mining in science or industry. You can also download its digital version here.
Europa frente a EEUU y China: Prevenir el declive en la era de la inteligencia artificial, Luis Moreno, Andrés Pedreño – Kdp (2020)
What is it about?: This interesting book addresses the reasons for the European delay with respect to the power that the US and China do have, and its consequences, but above all it proposes solutions to the problem that is exposed in the work.
Who is it for?: It is a reflection for those interested in thinking about the change that Europe would need, in the words of its author, "increasingly removed from the revolution imposed by the new technological paradigm".
What is it about?: This book calls attention to the problems that can lead to the misuse of algorithms and proposes some ideas to avoid making mistakes.
Who is it for?: These pages do not appear overly technical concepts, nor are there formulas or complex explanations, although they do deal with dense problems that need the author's attention.
Data Feminism (Strong Ideas), Catherine D’Ignazio, Lauren F. Klein. MIT Press (2020)
What is it about?: These pages address a new way of thinking about data science and its ethics based on the ideas of feminist thought.
Who is it for?: To all those who are interested in reflecting on the biases built into the algorithms of the digital tools that we use in all areas of life.
Open Cities | Open Data: Collaborative Cities in the Information, Scott Hawken, Hoon Han, Chris Pettit – Palgrave Macmillan, Singapore (2020)
What is it about?: This book explains the importance of opening data in cities through a variety of critical perspectives, and presents strategies, tools, and use cases that facilitate both data openness and reuse..
Who is it for?: Perfect for those integrated in the data value chain in cities and those who have to develop open data strategies within the framework of a smart city, but also for citizens concerned about privacy and who want to know what happens - and what can happen- with the data generated by cities.
Although we would love to include them all on this list, there are many interesting books on data and technology that fill the shelves of hundreds of bookstores and online stores. If you have any extra recommendations that you want to make us, do not hesitate to leave us your favorite title in comments. The members of the datos.gob.es team will be delighted to read your recommendations this Christmas.
In the last few months we have spent so much time at home, and we have realized the importance of culture. Music, movies, reading or painting have made those hours spent at home much more bearable.
Cultural institutions have a lot of valuable information. This information includes the collections managed by cultural institutions, i.e. works that have been digitally shared free of charge by museums or libraries. But it also includes the knowledge available about those collections. All this material, if shared in open format, can be reused to develop, for example, educational and learning content, documentaries or animations.
What types of cultural data can I find in datos.gob.es?
At datos.gob.es we have an extensive catalogue of cultural data. There are currently 2,300 datasets that have been grouped under a category called "culture and leisure". Among these datasets we find state, regional and local information. The publishers that share the most data of this type are the Autonomous Community of the Basque Country, the Centre for Sociological Research, the National Institute of Statistics and the National Library of Spain.
Of these datasets, some of the most popular are:
You can access these datasets through the following links:
- Spanish authors in the public domain
- Archive of the Spanish website: thematic collection: feminism
- Cultural Parks of Aragon
- Agenda of the museums in the province of Barcelona
- Free activities in Madrid's Municipal Libraries
As we can see, most of these datasets are related to the literary field, one of the most prolific and advanced when it comes to opening its data. However, in the catalogue we can also find data related to the pictorial, cinematographic or musical field.
One advantage of cultural datasets is their cross-sectional nature. For example, datasets such as the thematic collection on feminism or those on immigration provide us with information on our society, something important for understanding how we were and are.
From datos.gob.es we invite you to visit our data catalogue and dive into all the datasets.
Did you know that open digital heritage is fundamental to understanding the world around us, to boosting a more creative economy and to meeting agreed educational goals? It is estimated that around 90% of the world's cultural heritage has not yet been digitized. Of the remaining 10% that has been digitized, just 34% is available online while only 3% of that work is open
Most cultural institutions have a great deal of valuable information. This information is usually found mainly in the form of collections, but also in the files of authors, dates, technique, etc. Both the works and their information are susceptible to being converted into open data.
Some examples of museums that have digitized and opened their collection are The Rijksmuseum (Netherlands), the Statens Museum for Kunst (Denmark) or the Metropolitan Museum (USA).
What new products can be created by reusing cultural data?
New technologies offer us endless possibilities when it comes to creating new products through the reuse of cultural data. This is the case of new educational platforms such as BNEscolar, which allow for the learning of art from a different point of view. Open cultural data can even be used to generate scaperoom games or new art pieces. They can also be used as design tools or to complement films and documentaries.
In short, it can be established that cultural data are endowed with a great value that, through greater openness, could be of great benefit for the creation of new initiatives that provide benefits for the whole society.
Can we consider a work of as a data? When talking about open data, we usually think of statistical, meteorological, geospatial data... but we do not have in mind a painting, a song or a book. Resources also susceptible to becoming open data.
When we talk about open cultural data we refer to publications, photographs or musical collections created and distributed by institutions belonging to the cultural sector. It is not just the digitalization of the funds, but also enriching them with metadata that provide the maximum possible information (author, date, technique, etc.) and facilitate access in conditions that favour their reuse.
In this sense, libraries seem to have taken the lead in opening information. We have the example of the National Library of Spain, which launched the open data portal datos.bne.es and has launched different projects based on the reuse of its data, such as BNEscolar. Another example is the Miguel de Cervantes Virtual Library Foundation, whose catalogue consists of more than 230,000 records open for reuse.
Museums, meanwhile, are slowly embracing the commitment to open data, although there are an increasing number of institutions that are committed to sharing their collections openly.
Two examples of museums that have opened their collections
On February 7, 2017, the New York Metropolitan Museum implemented a new open data policy. The museum creates, organizes and disseminates a wide range of images and digital data that document the history of the museum and its collection - made up of more than two million works of art, from ancient Greece to European masters such as Rafael, Rembrandt or Velázquez-. With this policy, the images of selected works of art that are in the public domain – and, therefore, with lack of copyright - have been made available to users without restrictions or cost, in accordance with the Creative Commons Zero designation (CC0 ).
The museum's website has a search engine, which shows the different pieces of the collection. The user can separate those that are under the CC0 license, thanks to a filtering tool. In total there are 406,000 high resolution images, accompanied by basic information such as title, artist, date, medium and dimensions.
Another example is the National Museum of Amsterdam or Rijksmuseum, dedicated to the art, crafts and history of this region, which has a large collection of Dutch Golden Age paintings. The Rijksmuseum has an open data space where digital reproductions and associated data are collected. These data are made available to the public free of charge for all types of purposes, also commercial. When the works are free of copyright, it is explicitly indicated in the corresponding descriptive metadata. In these cases, the copyright notice establishes "Public domain", with a reference to the Creative Commons Zero (CC0) license.
Web pages of the Metropolitan Museum of New York and the Rijksmuseum
Why should museums open their collections?
In this interview with the people in charge of the Musée de Bretagne, pioneer in the opening of data among French museums, the interviewees highlight how thanks to the opening of their collection they have achieved greater visibility. The opening of data gives the museum “a positive and innovative image in the French culture sector. It also generates new knowledge about the museum’s collections, thanks to feedback from online visitors”. This museum has a collection of 700,000 pieces, of which more than 200,000 are now visible and reusable online, including free high-resolution public domain images to download and use.
In the same plot line, we can find this study that analyse the impact of the paintings and their metadata included in wikidata and Wikipedia in English. The study shows how the paintings included in Wikipedia are not only used to illustrate content related to art, but also to enrich other types of entries on diverse themes, such as history (for example, the paintings of kings that show us the aspect that they had) or basic concepts (show what a mermaid looks like through a pictorial representation of it). These paintings help complement textual information while attracting users' attention to museum collections, driving a views increase.
A field full of challenges, but also opportunities
Opening the works of the museums entails a series of challenges, such as the need to carry out a legal evaluation to know who are the rights holders and the contracts in force, so that copyright is always respected, or the technical challenges that entails. Museums will need a technological infrastructure, as well as resources to correctly catalogue all the works with their corresponding metadata (you can learn about the Prado Museum's experience in this interview).
But, on the other hand, if these challenges are overcome, the museum will gain multiple benefits, starting with the increase in visibility and the possibilities of its reuse to create valuable products and services.
The opening of data related to academic and research work entails multiple advantages, such as the improvement of transparency, the possibilities of replicating studies to verify their validity or greater visibility and impact that boost the recognition of the researcher. In this sense, Law 14/2011, of June 1, on Science, Technology and Innovation highlights the need to boost open access to research content, including mandatory when research has been funded with public funds.
This situation creates a series of challenges for researchers in fields such as humanities, who often lack the technical knowledge and resources necessary to publish their work in accessible, open, free and updated formats. Therefore, they often hire technical collaborators outside the research.
In order to provide a solution to this situation, the National Distance Education University (UNED) created LINHD, a research centre in Digital Humanities. LINHD - whose acronym means Laboratory of Innovation in Digital Humanities - seeks to create a new framework with interdisciplinary and hybrid work teams, with experts in technical and humanities areas who collaborate to foster innovation and share ideas. It also offers training, and advisory, consulting and technological services.
LINHD is one of the key elements of the dialogue in the Digital Humanities and a pioneer centre in Spain in this area. And it is because it deals with fundamental matters for the development of new technologies, competitiveness and productivity of humanities.
The laboratory started from an initiative that aimed to digitally connect the research data of the university itself, in order to improve transparency and visualize them through linked data technology (UNEDATA project), but soon grew by hosting and developing multiple projects and services, with institutions such as the Goethe Institute in Germany, the Thyssen Museum or the National Library. Thus, LINHD become the first digital humanities research centre in the Hispanic field of international reference and unique in its characteristics.
Through this philosophy, technology projects applied to the humanities are promoted, with special emphasis on fields such as art, philosophy, history, geography or education. This has led to projects that range from digital editions or visualizations of results to museums and virtual libraries.
In addition, the laboratory has been recognized as Clarin-K centre together with two centres of the UPF and the UPV, constituting the first “Knowledge Centre” of an ERIC-European Research Infrastructure Consortium in Spain.
LINHD también colabora con la Infraestructura de Investigación Digital en Artes y Humanidades DARIAH, que busca mejorar y desarrollar la investigación digital de humanidades en Europa.
Every January 1, new writers, painters, musicians and artists from all areas enter the public domain. This means that their artistic works may be edited, reproduced or disseminated publicly, without the restrictions established by copyright.
In Spain, artistic works enter the public domain after 70 years after the death of their author, in accordance with the intellectual property law of 1987. However, for authors who died before that year, the term will be 80 years. Therefore, this year the works of those authors who died in 1939 enter the public domain.
The BNE and the free access to intellectual heritage
Each year, the National Library of Spain studies, publishes and digitizes the work of authors who are part of its catalogue and that year enter into the public domain. This year the list consists of 181 authors, among them Antonio Machado, Ciro Bayo or Agustín Espinosa. The works of these authors are available in the Hispanic Digital Library, the portal that gives access to the digitalized funds of the BNE.
The list of authors in the public domain is open and collaborative, so that any interested person can collaborate in its elaboration by proposing Spanish authors with works in the catalogue of the National Library of Spain, who died between 1900 and 1939, and that are not yet included in the listings.
In addition, in order to enrich the information, the BNE has launched a collaborative project entitled “From the public domain: Spanish authors who died in 1939”, within its comunidad.bne.es platform, developed in collaboration with Red.es. Users who wish to share their knowledge and experience will help increase the data that the BNE has related to these authors, adding information such as his/her place and year of birth, occupation and / or biographical sketch. This project joins others as “La mujer del XIX [escrito en femenino]”, where different authors prestige items are rescued as Blanca de los Rios, Pardo Bazán or Patrocinio Biedma, integrated into the work “Spanish, American and Lusitanas women, painted by themselves”, so users can sort the stereotypes presented, catalogue illustrations that accompany many of the texts and complete data on their authors.
With the available information, the BNE has created complete files, which have been published as open and reusable datasets, in different formats, in the data catalog.gob.es.
This action is part of the BNE's strategy for opening and promoting the reuse of its data and digital collections, along with other projects such as BNEscolar.
After New Year, it seems that Christmas comes to an end, but we still have a date marked on our agenda: Three Kings Day. Adults and children hope to get up on January 6 and discover what the Three Wise Men from the East have brought us. And what better gift than a book that can help us expand our knowledge and skills.
For those who have not yet finished their Christmas purchases and are rushing at the last minute, in datos.gob.es we have collected a selection of books on data and disruptive technologies that can be a good option to gift to your loved ones. We have all levels books: basic, to encourage your younger relatives to study a career focused on data management and analysis (professions that will be highly demanded in the coming years) or advanced, for those professionals who want to improve their knowledge and gain a competitive advantage to boost their career in 2020.
Las bases de Big Data, by Rafael Caballero and Enrique Martín.
What is it about? Disclosure book that explains what Big Data is and how it works, including details and curiosities that allow the reader to better understand the big data world, its processing and the business involved. It also explains basic aspects of the Hadoop ecosystem or databases, both relational and non-relational.
Who is it for? It is an introductory and easy-to-read book. The book does not include a technical vision, but it is detailed and critical so that the reader wants to continue going deeper into the subject.
Storytelling with data. Data visualization for Business professionals, by Cole Nussbaumer Knaflic.
What is it about? A book to learn how to tell stories using data. Cole Nussbaumer tells us about the fundamentals of data visualization through real examples that help to understand the theory in a simple way. The book helps the user to reflect on the stories he/she wants to tell and how to tell them, teaching him to choose different types of graphics and tools according to the audience.
Who is it for? It is a simple and quick-to-read book, perfect for those who work with data, do not have a technical profile and want to improve the way they show the results.
Introduction to Data Science: Data Analysis and Prediction Algorithms with R, by Rafael A. Irizarry.
What is it about? Rafael A. Irizarry presents concepts and skills to solve the challenges of real-world data analysis. The book covers concepts from probability, statistical inference, linear regression, machine learning, R programming, data visualization, predictive algorithms building, file organization with UNIX / Linux shell, version control with Git and GitHub and preparation of reproducible documents.
Who is it for? To first-year data science students, so it is perfect to introduce this subject.
Learning Path: Understanding Tool Integration for Big Data Architecture, by O'Reilly Media
What is it about? The book explains how to integrate Hadoop components with the goal of implementing big data solutions for a variety of use cases, including clickstream analytics, time series problems, transferring data between Hadoop and relational databases, and applications in the finance sector.
Who is it for? Book aimed at professionals with technical knowledge related to the universe of data or advanced students.
Prediction Machines: The Simple Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans y Avi Goldfarb
What is it about? The book starts from a question: how should companies establish strategies, governments design policies and citizens plan their lives in a world marked by technology and Artificial Intelligence? 3 eminent economists try to clarify this issue by demystifying artificial intelligence and examining it through standard economic theory.
Who is it for? To all those who want to understand the reality of artificial intelligence, although it is especially aimed at entrepreneurs, business leaders or public policy makers.
The State of Open Data: Histories and Horizons, by Tim Davies, Stephen B. Walker y Mor Rubinstein.
What is it about? Book that reviews the lessons learned in the 10 years of the open data movement and looks to the future to make the reader reflect on how open data initiatives will respond to new privacy concerns, and the inclusion of artificial intelligence.
Who is it for? For those involved in the open data ecosystem, but also those who are curious about the evolution of the movement. The book is also available in free version here.
As in previous years, the list is just a selection that we have prepared based on recommendations from experts who collaborate with data.gob.es, but we know that there are many more interesting books on these topics. Therefore, we encourage you to share new recommendations in the comments.
El Bosco, Tiziano, El Greco, Rubens, Velázquez, Goya ... The Prado National Museum has more than 35,000 works of unquestionable value, which make it one of the most important art galleries in the world.
In the year of its bicentennial, we approached the Prado joint with Javier Pantoja, Head of the Digital Development Area, to know what innovative projects has been implemented to enrich the visitors experience.
- In recent years, the Prado Museum has launched a series of technological projects aimed at bringing its collections closer to citizens. How did this idea come about? What is the digital strategy of the Prado Museum?
The Prado Museum teams does not only seek to develop a digital strategy, but we want “the digital” to be part of the global museum's strategy. Therefore, the last 3 strategic plans have included a specific line of action for the digital area.
When we talk about digital strategy in the Prado, we refer to two aspects. On the one hand, we talk about processes, that is, the improvement of internal management. But, on the other hand, we also use digital tools to attract a higher audience to the Prado.
Our intention is considering technology as a tool, not as an end. We have a strong commitment to technologies such as linked data or artificial intelligence, but always without losing the objective of spreading the
- What kind of technologies are you implementing and how are you doing it? Can you tell us some concrete projects?
The most interesting point of the project was the launch of the current Prado website at the end of 2015. We were looking to create a semantic web, conceiving collections and the related information in a different way, that allow us to go further, and generate projects as augmented reading or the timeline that we have recently launched using Artificial Intelligence.
It has been a many-years process. The first task was to digitize and document the collection. For that we needed internal applications that allowed all areas of the museum to nurture the databases with their knowledge in a homogeneous way.
Then, we had to make that information available to the public using semantic web models. That is, we needed to define each element semantically: what a “technique”, a “support”, a “painter”, a “sculptor”, etc. are. It was about creating relationships between all the data to result in knowledge graphs. For example, we want to link Goya to the books that speak about him, the kings who ruled in his time, the techniques he used, etc. The work was tedious, but necessary to obtain good results.
This allowed to create a faceted search engine, which helped bring the collection closer to a much more users. With a boolean search engine, you have to know what you are looking for, while a faceted one is easier to find relationships.
All this informative work of updating, reviewing and publishing information open the way for a second part of the project: it was necessary to make the information attractive, through an aesthetic, beautiful and high-usability web. Do not forget that our goal was "position the Prado on the Internet." We had to create an easy-to-use data web. In the end, the Prado is a museum full of works of art and we seek to bring that art closer to the citizens.
The result was a website that has won several awards and recognitions.
- And how did the augmented reading projects and the timeline come about?
They come up from a very simple matter. Let me give an example. An user accesses to the La Anunciación by Fra Angelico work sheet. This sheet mentions terms such as Fiésole or Taddeo Gaddi. Probably, the majority of the users does not know what or who they are. That is, the information was very well written and documented, but it did not reach the entire audience.
Here, we had two solutions. On the one hand, we could make a basic-level worksheet, but creating a text adapted to anyone is very complex, regardless of their age, nationality, etc. That is why we chose another solution: What does a user do to find something he/she didn´t know? In this situation, users search on Google and click on Wikipedia.
As the Prado Museum cannot be an encyclopedia, we took advantage of the knowledge of Wikipedia and DBpedia. Mainly for 3 reasons:
- It has the precise knowledge structure
- It is a context encyclopedia
- It is an open source
- What challenges and barriers have you found when reusing this data?
First of all, we had to write down all the entities, cities, kings… one by one. The work was impossible, since there are currently 16,000 worksheets published on the web. In addition, the Prado is constantly studying and reviewing publications.
That is why we used a natural language recognition engine: the machine reads the text and the Artificial Intelligence understands it as a human, extracts entities and disambiguates the terms. The machine is processing the language, which it understands based on the context. In this task, we use the knowledge graph we already had and the relationships between the different entities through DBpedia.
The work is carried out together with Telefónica -sponsoring company- and GNOSS -developing company-; and the degree of reliability was very high, between 80% and 90%. Even so, there were complex issues. For example, when we talk about “the virgin's announcement”, we do not know if we refer to the concept, to the church in Milan, to some of the paintings related to this subject... In order to ensure that everything was correct, the documentation service reviewed the information.
- What was the next step?
We already had an increased reading. At this point, we asked ourselves why not making these relationships visible. And so that, the Timeline emerged: a cluster of new knowledge graphs that allowed the user to see in a simpler way the relationships between concepts, result of the exploitation of the linked data web.
Timelines tend to had just one layer, but we wanted to go further and create a multilayer structure that would allow us to understand and deepen the concepts: a layer of history, another layer of literature, architecture, philosophy, performing arts ... In this way we can easily see, for example, what works were created during the 100-year war.
For this, we had to review the datasets and select the concepts according to the interest they generate and their context into the Prado collection.
- These types of projects have great potential to be used in the educational field ...
Yes, our tool has a marked informative intention and has great potential to be exploited in educational environments. We have tried to make it easy for any teacher or disseminator to have at a glance the entire context of a work.
But it could also be used to learn history. For example, you can teach the Spanish Independence War using “The second of May” and “The third of May”, paintings elaborated by Goya. In this way, more active learning could be achieved, based on the relationships between concepts.
The tool has an appropriate approach for secondary and high school students, but could also be used at other stages.
- What profiles are necessary to carry out such a project?
We create multidisciplinary teams, composed of designers, computer scientists, documentaries, etc. to carry out the entire project
There are very good specialists doing semantics, working on linked data, artificial intelligence, etc. But it is also necessary to have people with the ideas to join the puzzle into something usable and useful for the user. That is to say, linking wills and knowledge, around a global idea and objectives. Technological advances are very good, and you have to know them to take advantage of their benefits, but always with a clear objective
- Could you tell us what are the next steps to be followed in terms of technological innovation?
We would like to reach agreements with other museums and entities so that all knowledge is integrated. We would like to enrich the information and link it with data from international and national cultural and museum institutions.
We already have some alliances, for example, an agreement with the National Film Library and the RTVE visual archive, and we would like to continue working on that line.
In addition, we will continue working with our data to bring the Prado collection closer to all citizens. The Museum has to be the first reuser of its data sources because it is who best knows them and who can get a good result from them.
When planning our vacations, we all look for the perfect destination that meets our expectations: beach, mountain, city... But once we have decided where to go, we still have to make many decisions: how am I going to organize my trip so that everything goes perfect? Luckily, we have hundreds of applications that make our lives much easier.
Nowadays you can use some apps to calculate the most suitable route to reach your destination or to decide where to fill up the car during the journey without damaging your pocket. You can also look for accommodation or restaurants with a good value for money, using your mobile. Or consult the cultural activities of the area, looking for activities aimed at the youngest members of the family. And, if you want to go to the beach, you can easily check the state of the sea, the wind or the temperature of the water without leaving the hotel.
All these applications, in addition to helping us organize our vacations in a simple way, have something in common: their functioning is based on open data from public administrations.
The fact that an increasing number of local administrations are opening their tourism data allows us to generate services that help to manage our trips in a more efficient way, integrating information that is sometimes difficult to find. As an example, Asturias or Aragón are promoting catalogues of specific datasets focused on this area.
Many of these applications have been designed by individuals and companies, reusing available open data, but others have been promoted by the public administrations. This is because tourism open data not only help visitors, but also have great advantages for municipalities.
Tourism is a fundamental economic activity for our country. During the first 5 months of 2018, Spain exceeded 28.6 million international tourists, representing an increase of 2% over the same period of the previous year. These tourists are a great source of economic income: just in May, the average expenditure per tourist was 1,009 euros, 1.8% more than in 2017. It is not surprising, therefore, that all city councils wants to promote their services and attract visitors.
Tourism applications based on open data can favor interaction between visitors and the local community, promoting local services and fostering economic growth. In addition, some applications even allow to collect information from users under their consent. The analysis of this anonymized information, combined with other data sets such as the total expenditure on trips of international tourists who participate in cultural activities or tourists housed by municipalities, allows knowing tourists behavior patterns and designing specific policies with the focus on innovation and intelligent management of tourist destinations.
Traditionally, the tourism sector has a great weight in Spain, but like all sectors, it has to keep renewing itself so as not to be left behind, integrating new elements that help improve the visitor's experience. Open data linked to the use of new technologies such as Big Data analysis and artificial intelligence are a good option - for example to make recommendations and customizations based on user behavior-. The ultimate goal is to provide a high quality global service that allows us to continue being leaders and receiving millions of visitors year after year.
Data science is reaching all sectors, from agriculture to health, tourism or transport. And culture, as illustrated by the DataLab of Juan March Foundation, a knowledge laboratory that not only seeks to manage and disseminate the contents from the Library, but also extracts valuable information that can be used to optimize processes, take better decisions or create new services.
This DataLab was created in 2013, collecting the experiences learned in the Data Library of the former Center for Advanced Studies in Social Sciences (CEACS) of Juan March Institute. The DataLab was born inside the Juan March Foundation Library in order to lead the organization of digital knowledge produced by the Library itself and the rest of the Foundation's areas, dealing with the structure, curation and analytics of digital data.
The challenge was big. The Foundation had been created in 1955, and, since then, it had accumulated a valuable collection composed of contents with different formats: videos, images, audios, texts... in addition to the large amount of knowledge organized and conserved for internal purposes in the Foundation's departments.
In order to face this challenge, a multidisciplinary team was chosen, integrating librarians, technologists and mathematicians. Using technologies and methodologies specific to data science, they were able to explore new ways of analyzing and visualizing information.
Specifically, the DataLab has 4 areas of action:
- Data curation: One of the main tasks of this DataLab is the management and continuous maintenance of the digital repositories property of Juan March Foundation. Their responsibilities include the classification, documentation, storage, integration and digital preservation of data. Currently these data are disseminated through thematic knowledge portals (for example, focused on contemporary Spanish music or Spanish musical theater), visited by an average of 10,000 users per month, including researchers and specialists in social sciences and humanities that look for an inspiration source for new projects.
- Analytics: One of the new features of this DataLab is its commitment to analytics applied to cultural data. On the one hand, they apply analytical technologies to data curation, which results in automated processes that facilitate the classification of information. On the other hand, they also make use of data analytically. In other words, the DataLab functions as a transversal service that provides business intelligence to other Foundation's areas, through the creation of scorecards and the resolution of specific requests for information and analysis.
- Infrastructure: To implement a project of this magnitude, it was necessary to create an entire technological infrastructure that allow capturing data from different sources, organizing and structuring to take advantage using different tools and processes. Therefore, they are in a continuous process of development to redifine data capture, normalization, analysis and enrichment. All this is carried out using specialized Big Data environments in the cloud.
- Innovation. One of the fundamental pillars of this DataLab is the experimentation with new technologies that can provide an additional value layer to data. For example, data capture processes are enriched thanks to Artificial Intelligence tools, which carry out from sentiment analysis of social media to content transcriptions and automatic classifications.
All this work has resulted in a project in continuous evolution and growth, which makes valuable information available to users, demonstrating the potential of data science applied to culture.