Entrevista

Collaborative culture and citizen open data projects are key to democratic access to information. This contributes to free knowledge that allows innovation to be promoted and citizens to be empowered.

In this new episode of the datos.gob.es podcast, we are joined by two professionals linked to citizen projects that have revolutionized the way we access, create and reuse knowledge. We welcome:

  • Florencia Claes, professor and coordinator of Free Culture at the Rey Juan Carlos University, and former president of Wikimedia Spain.
  • Miguel Sevilla-Callejo, researcher at the CSIC (Spanish National Research Council) and Vice-President of the OpenStreetMap Spain association.

Listen the episode (in spanish) 

  1. How would you define free culture?

Florencia Claes: It is any cultural, scientific, intellectual expression, etc. that as authors we allow any other person to use, take advantage of, reuse, intervene in and relaunch into society, so that another person does the same with that material.

In free culture, licenses come into play, those permissions of use that tell us what we can do with those materials or with those expressions of free culture.

  1. What role do collaborative projects have within free culture?

Miguel Sevilla-Callejo: Having projects that are capable of bringing together these free culture initiatives is very important. Collaborative projects are horizontal initiatives in which anyone can contribute. A consensus is structured around them to make that project, that culture, grow.

  1. You are both linked to collaborative projects such as Wikimedia and OpenStreetMap. How do these projects impact society?

Florencia Claes: Clearly the world would not be the same without Wikipedia. We cannot conceive of a world without Wikipedia, without free access to information. I think Wikipedia is associated with the society we are in today. It has built what we are today, also as a society. The fact that it is a collaborative, open, free space, means that anyone can join and intervene in it and that it has a high rigor.

So, how does it impact? It impacts that (it will sound a little cheesy, but...) we can be better people, we can know more, we can have more information. It has an impact on the fact that anyone with access to the internet, of course, can benefit from its content and learn without necessarily having to go through a paywall or be registered on a platform and change data to be able to appropriate or approach the information.

Miguel Sevilla-Callejo: We call OpenStreetMap the Wikipedia of maps, because a large part of its philosophy is copied or cloned from the philosophy of Wikipedia. If you imagine Wikipedia, what people do is they put encyclopedic articles. What we do in OpenStreetMap is to enter spatial data. We build a map collaboratively and this assumes that the openstreetmap.org page, which is where you could go to look at the maps, is just the tip of the iceberg. That's where OpenStreetMap is a little more diffuse and hidden, but most of the web pages, maps and spatial information that you are seeing on the Internet, most likely in its vast majority, comes from the data of the great free, open and collaborative database that is OpenStreetMap.

Many times you are reading a newspaper and you see a map and that spatial data is taken from OpenStreetMap. They are even used in agencies: in the European Union, for example, OpenStreetMap is being used. It is used in information from private companies, public administrations, individuals, etc. And, in addition, being free, it is constantly reused.

I always like to bring up projects that we have done here, in the city of Zaragoza. We have generated the entire urban pedestrian network, that is, all the pavements, the zebra crossings, the areas where you can circulate... and with this a calculation is made  of how you can move around the city on foot. You can't find this information on sidewalks, crosswalks and so on on a website because it's not very lucrative, such as getting around by car, and you can take advantage of it, for example, which is what we did in some jobs that I directed at university, to be able to know how different mobility is with blind people.  in a wheelchair or with a baby carriage.

  1. You are telling us that these projects are open. If a citizen is listening to us right now and wants to participate in them, what should they do to participate? How can you be part of these communities?

Florencia Claes: The interesting thing about these communities is that you don't need to be formally associated or linked to them to be able to contribute. In Wikipedia you simply enter the Wikipedia page and become a user, or not, and you can edit. What is the difference between making your username or not? In that you will be able to have better access to the contributions you have made, but we do not need to be associated or registered anywhere to be able to edit Wikipedia.

If there are groups at the local or regional level related to the Wikimedia Foundation that receive grants and grants to hold meetings or activities. That's good, because you meet people with the same concerns and who are usually very enthusiastic about free knowledge. As my friends say, we are a bunch of geeks who have met and feel that we have a group of belonging in which we share and plan how to change the world.

Miguel Sevilla-Callejo: In OpenStreetMap it is practically the same, that is, you can do it alone. It is true that there is a bit of a difference with respect to Wikipedia. If you go to the openstreetmap.org page, where we have all the documentation – which is wiki.OpenStreetMap.org – you can go there and you have all the documentation.

It is true that to edit in OpenStreetMap you do need a user to better track the changes that people make to the map. If it were anonymous there could be more of a problem, because it is not like the texts in Wikipedia. But as Florencia said, it's much better if you associate yourself with a community.

We have local groups in different places. One of the initiatives that we have recently reactivated is the OpenStreetMap Spain association, in which, as Florencia said, we are a group of those who like data and free tools, and there we share all our knowledge. A lot of people come up to us and say "hey, I just entered OpenStreetMap, I like this project, how can I do this? How can I do the other?"And well, it's always much better to do it with other colleagues than to do it alone. But anyone can do it.

  1. What challenges have you encountered when implementing these collaborative projects and ensuring their sustainability over time? What are the main challenges, both technical and social, that you face?

Miguel Sevilla-Callejo: One of the problems we find in all these movements that are so horizontal and in which we have to seek consensus to know where to move forward, is that in the end it is relatively problematic to deal with a very diverse community. There is always friction, different points of view... I think this is the most problematic thing. What happens is that, deep down, as we are all moved by enthusiasm for the project, we end up reaching agreements that make the project grow, as can be seen in Wikimedia and OpenStreetMap themselves, which continue to grow and grow.

From a technical point of view, for some things in particular, you have to have a certain computer prowess, but we are very, very basic. For example, we have made mapathons, which consist of us meeting in an area with computers and starting to put spatial information in areas, for example, where there has been a natural disaster or something like that. Basically, on a satellite image, people place little houses where they see - little houses there in the middle of the Sahel, for example, to help NGOs such as Doctors Without Borders. That's very easy: you open it in the browser, open OpenStreetMap and right away, with four prompts, you're able to edit and contribute.

It is true that, if you want to do things that are a little more complex, you have to have more computer skills. So it is true that we always adapt. There are people who are entering data in a very pro way, including buildings, importing data from the cadastre... and there are people like a girl here in Zaragoza who recently discovered the project and is entering the data they find with an application on their mobile phone.

I do really find a certain gender bias in the project. That within OpenStreetMap worries me a little, because it is true that a large majority of the people we are editing, including the community, are men and that in the end does mean that some data has a certain bias. But hey, we're working on it.

Florencia Claes: In that sense, in the Wikimedia environment, that also happens to us. We have, more or less worldwide, 20% of women participating in the project against 80% of men and that means that, for example, in the case of Wikipedia, there is a preference for articles about footballers sometimes. It is not a preference, but simply that the people who edit have those interests and as they are more men, we have more footballers, and we miss articles related, for example, to women's health.

So we do face biases and we face that coordination of the community. Sometimes people with many years participate, new people... and achieving a balance is very important and very difficult. But the interesting thing is when we manage to keep in mind or remember that the project is above us, that we are building something, that we are giving something away, that we are participating in something very big. When we become aware of that again, the differences calm down and we focus again on the common good which, after all, I believe is the goal of these two projects, both in the Wikimedia environment and OpenStreetMap.

  1. As you mentioned, both Wikimedia and OpenStreetMap are projects built by volunteers. How do you ensure data quality and accuracy?

Miguel Sevilla-Callejo: The interesting thing about all this is that the community is very large and there are many eyes watching. When there is a lack of rigor in the information, both in Wikipedia – which people know more about – but also in OpenStreetMap, alarm bells go off. We have tracking systems and it's relatively easy to see dysfunctions in the data. Then we can act quickly. This gives a capacity, in OpenStreetMap in particular, to react and update the data practically immediately and to solve those problems that may arise also quite quickly. It is true that there has to be a person attentive to that place or that area.

I've always liked to talk about OpenStreetMap data as a kind of - referring to it as it is done in the software - beta map, which has the latest, but there can be some minimal errors. So, as a strongly updated and high-quality map, it can be used for many things, but for others of course not, because we have another reference cartography that is being built by the public administration.

Florencia Claes: In the Wikimedia environment we also work like this, because of the mass, because of the number of eyes that are looking at what we do and what others do. Each one, within this community, is assuming roles. There are roles that are scheduled, such as administrators or librarians, but there are others that simply: I like to patrol, so what I do is keep an eye on new articles and I could be looking at the articles that are published daily to see if they need any support, any improvement or if,  on the contrary, they are so bad that they need to be removed from the main part or erased.

The key to these projects is the number of people who participate and everything is voluntary, altruistic. The passion is very high, the level of commitment is very high. So people take great care of those things. Whether data is curated to upload to Wikidata or an article is written on Wikipedia, each person who does it, does it with great affection, with great zeal. Then time goes by and he is aware of that material that he uploaded, to see how it continued to grow, if it was used, if it became richer or if, on the contrary, something was erased.

Miguel Sevilla-Callejo: Regarding the quality of the data, I find interesting, for example, an initiative that the Territorial Information System of Navarre has now had. They have migrated all their data for planning and guiding emergency routes to OpenStreetMap, taking their data. They have been involved in the project, they have improved the information, but taking what was already there [in OpenStreetMap], considering that they had a high quality and that it was much more useful to them than using other alternatives, which shows the quality and importance that this project can have.

  1. This data can also be used to generate open educational resources, along with other sources of knowledge. What do these resources consist of and what role do they play in the democratization of knowledge?

Florencia Claes:  OER, open educational resources, should be the norm. Each teacher who generates content should make it available to citizens and should be built in modules from free resources. It would be ideal.

What role does the Wikimedia environment have in this? From housing information that can be used when building resources, to providing spaces to perform exercises or to take, for example, data and do work with SPARQL. In other words, there are different ways of approaching Wikimedia projects in relation to open educational resources. You can intervene and teach students how to identify data, how to verify sources, to simply make a critical reading of how information is presented, how it is curated, and make, for example, an assessment between languages.

Miguel Sevilla-Callejo: OpenStreetMap is very similar. What's interesting and unique is what the nature of the data is. It's not exactly information in different formats like in Wikimedia. Here the information is that free spatial database that is OpenStreetMap. So the limits are the imagination.

I remember that there was a colleague who went to some conferences and made a cake with the OpenStreetMap map. He would feed it to the people and say, "See? These are maps that we have been able to eat because they are free." To make more serious or more informal or playful cartography, the limit is only your imagination. It happens exactly the same as with Wikipedia.

  1. Finally, how can citizens and organizations be motivated to participate in the creation and maintenance of collaborative projects linked to free culture and open data?

Florencia Claes: I think we have to clearly do what Miguel said about the cake. You have to make a cake and invite people to eat cake. Seriously talking about what we can do to motivate citizens to reuse this data, I believe, especially from personal experience and from the groups with which I have worked on these platforms, that the interface is friendly is a very important step.

In Wikipedia in 2015, the visual editor was activated. The visual editor made us join many more women to edit Wikipedia. Before, it was edited only in code and code, because at first glance it can seem hostile or distant or "that doesn't go with me". So, to have interfaces where people don't need to have too much knowledge to know that this is a package that has such and such data and I'm going to be able to read it with such a program or I'm going to be able to dump it into such and such a thing and make it simple, friendly, attractive... I think that this is going to remove many barriers and that it will put aside the idea that data is for computer scientists. And I think that data goes further, that we can really take advantage of all of them in very different ways. So I think it's one of the barriers that we should overcome.

Miguel Sevilla-Callejo: It didn't happen to us that until about 2015 (forgive me if it's not exactly the date), we had an interface that was quite horrible, almost like the code edition you have in Wikipedia, or worse, because you had to enter the data knowing the labeling, etc. It was very complex. And now we have an editor that basically you're in OpenStreetMap, you hit edit and a super simple interface comes out. You don't even have to put labeling in English anymore, it's all translated. There are many things pre-configured and people can enter the data immediately and in a very simple way. So what that has allowed is that many more people come to the project.

Another very interesting thing, which also happens in Wikipedia, although it is true that it is much more focused on the web interface, is that around OpenStreetMap an ecosystem of applications and services has been generated that has made it possible, for example, to appear mobile applications that, in a very fast, very simple way, allow data to be put directly on foot on the ground. And this makes it possible for people to enter the data in a simple way.

I wanted to stress it again, although I know that we are reiterating all the time in the same circumstance, but I think it is important to comment on it, because I think that we forget that within the projects: we need people to be aware again that data is free, that it belongs to the community,  that it is not in the hands of a private company, that it can be modified, that it can be transformed, that behind it there is a community of voluntary, free people, but that this does not detract from the quality of the data, and that it reaches everywhere. So that people come closer and don't see us as a weirdo. I think that Wikipedia is much more integrated into society's knowledge and now with artificial intelligence much more, but it happens to us in OpenStreetMap, that they look at you like saying "but what are you telling me if I use another application on my mobile?" or you're using ours, you're using OpenStreetMap data without knowing it. So we need to get closer to society, to get them to know us better.

Returning to the issue of association, that is one of our objectives, that people know us, that they know that this data is open, that it can be transformed, that they can use it and that they are free to have it to build, as I said before, what they want and the limit is their imagination.

Florencia Claes: I think we should somehow integrate through gamification, through games in the classroom, the incorporation of maps, of data within the classroom, within the day-to-day schooling. I think we would have a point in our favour there. Given that we are within a free ecosystem, we can integrate visualization or reuse tools on the same pages of the data repositories  that I think would make everything much friendlier and give a certain power to citizens, it would empower them in such a way that they would be encouraged to use them.

Miguel Sevilla-Callejo:  It's interesting that we have things that connect both projects (we also sometimes forget the people of OpenStreetMap and Wikipedia), that there is data that we can exchange, coordinate and add. And that would also add to what you just said.

Subscribe to our Spotify profile to keep up to date with our podcasts

calendar icon
Entrevista

Open knowledge is knowledge that can be reused, shared and improved by other users and researchers without noticeable restrictions. This includes data, academic publications, software and other available resources. To explore this topic in more depth, we have representatives from two institutions whose aim is to promote scientific production and make it available in open access for reuse:

  • Mireia Alcalá Ponce de León, Information Resources Technician of the Learning, Research and Open Science Area of the Consortium of University Services of Catalonia (CSUC).
  • Juan Corrales Corrillero, Manager of the data repository of the Madroño Consortium.

 

Listen here the podcast (in spanish)

 

Summary of the interview

1.Can you briefly explain what the institutions you work for do?

Mireia Alcalá: The CSUC is the Consortium of University Services of Catalonia and is an organisation that aims to help universities and research centres located in Catalonia to improve their efficiency through collaborative projects. We are talking about some 12 universities and almost 50 research centres.
We offer services in many areas: scientific computing, e-government, repositories, cloud administration, etc. and we also offer library and open science services, which is what we are closest to. In the area of learning, research and open science, which is where I am working, what we do is try to facilitate the adoption of new methodologies by the university and research system, especially in open science, and we give support to data management research.

Juan Corrales: The Consorcio Madroño is a consortium of university libraries of the Community of Madrid and the UNED (National University of Distance Education) for library cooperation.. We seek to increase the scientific output of the universities that are part of the consortium and also to increase collaboration between the libraries in other areas. We are also, like CSUC, very involved in open science: in promoting open science, in providing infrastructures that facilitate it, not only for the members of the Madroño Consortium, but also globally. Apart from that, we also provide other library services and create structures for them.

2. What are the requirements for an investigation to be considered open?

Juan Corrales: For research to be considered open there are many definitions, but perhaps one of the most important is given by the National Open Science Strategy, which has six pillars.

One of them is that it is necessary to put in open access both research data and publications, protocols, methodologies.... In other words, everything must be accessible and, in principle, without barriers for everyone, not only for scientists, not only for universities that can pay for access to these research data or publications. It is also important to use open source platforms that we can customise. Open source is software that anyone, in principle with knowledge, can modify, customise and redistribute, in contrast to the proprietary software of many companies, which does not allow all these things. Another important point, although this is still far from being achieved in most institutions, is allowing open peer review, because it allows us to know who has done a review, with what comments, etc. It can be said that it allows the peer review cycle to be redone and improved. A final point is citizen science: allowing ordinary citizens to be part of science, not only within universities or research institutes.
And another important point is adding new ways of measuring the quality of science

Mireia Alcalá:. I agree with what Juan says. I would also like to add that, for an investigation process to be considered open, we have to look at it globally. That is, include the entire data lifecycle. We cannot talk about a science being open if we only look at whether the data at the end is open. Already at the beginning of the whole data lifecycle, it is important to use platforms and work in a more open and collaborative way.

3 Why is it important for universities and research centres to make their studies and data available to the public?

Mireia Alcalá:. I think it is key that universities and centres share their studies, because a large part of research, both here in Spain and at European and world level, is funded with public money. Therefore, if society is paying for the research, it is only logical that it should also benefit from its results. In addition, opening up the research process can help make it more transparent, more accountable, etc. Much of the research done to date has been found to be neither reusable nor reproducible. What does this mean? That the studies that have been done, almost 80% of the time someone else can't take it and reuse that data. Why? Because they don't follow the same standards, the same mannersand so on. So, I think we have to make it extensive everywhere and a clear example is in times of pandemics. With COVID-19, researchers from all over the world worked together, sharing data and findings in real time, working in the same way, and science was seen to be much faster and more efficient.

Juan Corrales: The key points have already been touched upon by Mireia. Besides, it could be added that bringing science closer to society can make all citizens feel that science is something that belongs to us, not just to scientists or academics. It is something we can participate in and this can also help to perhaps stop hoaxes, fake news, to have a more exhaustive vision of the news that reaches us through social networks and to be able to filter out what may be real and what may be false.

4.What research should be published openly?

Juan Corrales: Right now, according to the law we have in Spain, the latest Law of science, all publications that are mainly financed by public funds or in which public institutions participatemust be published in open access. This has not really had much repercussion until last year, because, although the law came out two years ago, the previous law also said so, there is also a law of the Community of Madrid that says the same thing.... but since last year it is being taken into account in the evaluation that the ANECA (the Quality Evaluation Agency) does on researchers.. Since then, almost all researchers have made it a priority to publish their data and research openly. Above all, data was something that had not been done until now.

Mireia Alcalá: At the state level it is as Juan says. We at the regional level also have a law from 2022, the Law of science, which basically says exactly the same as the Spanish law. But I also like people to know that we have to take into account not only the state legislation, but also the calls for proposals from where the money to fund the projects comes from. Basically in Europe, in framework programmes such as Horizon Europe, it is clearly stated that, if you receive funding from the European Commission, you will have to make a data management plan at the beginning of your research and publish the data following the FAIR principles.

 

5.Among other issues, both CSUC and Consorcio Madroño are in charge of supporting entities and researchers who want to make their data available to the public. How should a process of opening research data be? What are the most common challenges and how do you solve them?

Mireia Alcalá: In our repository, which is called RDR (from Repositori de Dades de Recerca), it is basically the participating institutions that are in charge of supporting the research staff.. The researcher arrives at the repository when he/she is already in the final phase of the research and needs to publish the data yesterday, and then everything is much more complex and time consuming. It takes longer to verify this data and make it findable, accessible, interoperable and reusable.
In our particular case, we have a checklist that we require every dataset to comply with to ensure this minimum data quality, so that it can be reused. We are talking about having persistent identifiers such as ORCID for the researcher or ROR to identify the institutions, having documentation explaining how to reuse that data, having a licence, and so on. Because we have this checklist, researchers, as they deposit, improve their processes and start to work and improve the quality of the data from the beginning. It is a slow process. The main challenge, I think, is that the researcher assumes that what he has is data, because most of them don't know it. Most researchers think of data as numbers from a machine that measures air quality, and are unaware that data can be a photograph, a film from an archaeological excavation, a sound captured in a certain atmosphere, and so on. Therefore, the main challenge is for everyone to understand what data is and that their data can be valuable to others.
And how do we solve it? Trying to do a lot of training, a lot of awareness raising. In recent years, the Consortium has worked to train data curation staff, who are dedicated to helping researchers directly refine this data. We are also starting to raise awareness directly with researchers so that they use the tools and understand this new paradigm of data management.

Juan Corrales: In the Madroño Consortium, until November, the only way to open data was for researchers to pass a form with the data and its metadata to the librarians, and it was the librarians who uploaded it to ensure that it was FAIR. Since November, we also allow researchers to upload data directly to the repository, but it is not published until it has been reviewed by expert librarians, who verify that the data and metadata are of high quality. It is very important that the data is well described so that it can be easily found, reusable and identifiable.

As for the challenges, there are all those mentioned by Mireia - that researchers often do not know they have data - and also, although ANECA has helped a lot with the new obligations to publish research data, many researchers want to put their data running in the repositories, without taking into account that they have to be quality data, that it is not enough to put them there, but that it is important that these data can be reused later.

6.What activities and tools do you or similar institutions provide to help organisations succeed in this task?

Juan Corrales: From Consorcio Madroño, the repository itself that we use, the tool where the research data is uploaded, makes it easy to make the data FAIR, because it already provides unique identifiers, fairly comprehensive metadata templates that can be customised, and so on. We also have another tool that helps create the data management plans for researchers, so that before they create their research data, they start planning how they're going to work with it. This is very important and has been promoted by European institutions for a long time, as well as by the Science Act and the National Open Science Strategy.
Then, more than the tools, the review by expert librarians is also very important. There are other tools that help assess the quality of adataset, of research data, such as Fair EVA or F-Uji, but what we have found is that those tools at the end what they are evaluating more is the quality of the repository, of the software that is being used, and of the requirements that you are asking the researchers to upload this metadata, because all our datasets have a pretty high and quite similar evaluation. So what those tools do help us with is to improve both the requirements that we're putting on our datasets, on our datasets, and to be able to improve the tools that we have, in this case the Dataverse software, which is the one we are using.

Mireia Alcalá: At the level of tools and activities we are on a par, because we have had a relationship with the Madroño Consortium for years, and just like them we have all these tools that help and facilitate putting the data in the best possible way right from the start, for example, with the tool for making data management plans. Here at CSUC we have also been working very intensively in recent years to close this gap in the data life cycle, covering issues of infrastructures, storage, cloud, etc. so that, when the data is analysed and managed, researchers also have a place to go. After the repository, we move on to all the channels and portals that make it possible to disseminate and make all this science visible, because it doesn't make sense for us to make repositories and they are there in a silo, but they have to be interconnected. For many years now, a lot of work has been done on making interoperability protocols and following the same standards. Therefore, data has to be available elsewhere, and both Consorcio Madroño and we are everywhere possible and more.

7. Can you tell us a bit more about these repositories you offer? In addition to helping researchers to make their data available to the public, you also offer a space, a digital repository where this data can be housed, so that it can be located by users.
 

Mireia Alcalá: If we are talking specifically about research data, as we and Consorcio Madroño have the same repository, we are going to let Juan explain the software and specifications, and I am going to focus on other repositories of scientific production that CSUC also offers. Here what we do is coordinate different cooperative repositories according to the type of resource they contain.  So, we have TDX for thesis, RECERCAT for research papers, RACO for scientific journals or MACO, for open access monographs. Depending on the type of product, we have a specific repository, because not everything can be in the same place, as each output of the research has different particularities. Apart from the repositories, which are cooperative, we also have other spaces that we make for specific institutions, either with a more standard solution or some more customised functionalities. But basically it is this: we have for each type of output that there is in the research, a specific repository that adapts to each of the particularities of these formats.

Juan Corrales: In the case of Consorcio Madroño, our repository is called e-scienceData, but it is based on the same software as the CSUC repository, which is Dataverse.. It is open source software, so it can be improved and customised. Although in principle the development is managed from Harvard University in the United States, institutions from all over the world are participating in its development - I don't know if thirty-odd countries have already participated in its development.
 Among other things, for example,  the translations into Catalan have been done by CSUC, the translation into Spanish has been done by Consorcio Madroño and we have also participated in other small developments. The advantage of this software is that it makes it much easier for the data to be FAIR and compatible with other points that have much more visibility, because, for example, the CSUC is much larger, but in the Madroño Consortium there are six universities, and it is rare that someone goes to look for a dataset in the Madroño Consortium, in e-scienceData, directly. They usually search for it via Google or a European or international portal. With these facilities that Dataverse has, they can search for it from anywhere and they can end up finding the data that we have at Consorcio Madroño or at CSUC.

 

8. What other platforms with open research data, at Spanish or European level, do you recommend?

Juan Corrales:  For example, at the Spanish level there is the FECYT, the Spanish Foundation for Science and Technology, which has a box that collects the research data of all Spanish institutions practically. All the publications of all the institutions appear there: Consorcio Madroño, CSUC and many more.
Then, specifically for research data, there is a lot of research that should be put in a thematic repository, because that's where researchers in that branch of science are going to look. We have a tool to help choose the thematic repository. At the European level there is Zenodo, which has a lot of visibility, but does not have the data quality support of CSUC or the Madroño Consortium. And that is something that is very noticeable in terms of reuse afterwards.

Mireia Alcalá: At the national level, apart from Consorcio Madroño's and our own initiatives, data repositories are not yet widespread. We are aware of some initiatives under development, but it is still too early to see their results. However, I do know of some universities that have adapted their institutional repositories so that they can also add data. And while this is a valid solution for those who have no other choice, it has been found that software used in repositories that are not designed to handle the particularities of the data - such as heterogeneity, format, diversity, large size, etc. - are a bit lame. Then, as Juan said, at the European level, it is established that Zenodo is the multidisciplinary and multiformat repository, which was born as a result of a European project of the Commission. I agree with him that, as it is a self-archiving and self-publishing repository - that is, I Mireia Alcalá can go there in five minutes, put any document I have there, nobody has looked at it, I put the minimum metadata they ask me for and I publish it -, it is clear that the quality is very variable. There are some things that are really usable and perfect, but there are others that need a little more TLC. As Juan said, also at the disciplinary level it is important to highlight that, in all those areas that have a disciplinary repository, researchers have to go there, because that is where they will be able to use their most appropriate metadata, where everybody will work in the same way, where everybody will know where to look for those data.... For anyone who is interested there is a directory called re3data, which is basically a directory of all these multidisciplinary and disciplinary repositories. It is therefore a good place for anyone who is interested and does not know what is in their discipline. Let him go there, he is a good resource.

9. What actions do you consider to be priorities for public institutions in order to promote open knowledge?

Mireia Alcalá: What I would basically say is that public institutions should focus on making and establishing clear policies on open science, because it is true that we have come a long way in recent years, but there are times when researchers are a bit bewildered. And apart from policies, it is above all offering incentives to the entire research community, because there are many people who are making the effort to change their way of working to become immersed in open science and sometimes they don't see how all that extra effort they are making to change their way of working to do it this way pays off. So I would say this: policies and incentives.

Juan Corrales: From my point of view, the theoretical policies that we already have at the national level, at the regional level, are usually quite correct, quite good. The problem is that often no attempt has been made to enforce them. So far, from what we have seen for example with ANECA - which has promoted the use of data repositories or research article repositories - they have not really started to be used on a massive scale. In other words, incentives are necessary, and not just a matter of obligation. As Mireia has also said, we have to convince researchers to see open publishing as theirs, as it is something that benefits both them and society as a whole. What I think is most important is that: the awareness of researchers

Suscribe to our Spotify profile

calendar icon
Entrevista

Did you know that data science skills are among the most in-demand skills in business? In this podcast, we are going to tell you how you can train yourself in this field, in a self-taught way. For this purpose, we will have two experts in data science:

  • Juan Benavente, industrial and computer engineer with more than 12 years of experience in technological innovation and digital transformation. In addition, it has been training new professionals in technology schools, business schools and universities for years.
  • Alejandro Alija, PhD in physics, data scientist and expert in digital transformation.  In addition to his extensive professional experience focused on the Internet of Things (internet of things), Alejandro also works as a lecturer in different business schools and universities.

 

Listen to the podcast (in spanish)

Summary of the interview

  1. What is data science? Why is it important and what can it do for us? 

Alejandro Alija: Data science could be defined as a discipline whose main objective is to understand the world, the processes of business and life, by analysing and observing data.Data science is a discipline whose main objective is to understand the world, the processes of business and life, by analysing and observing the data.. In the last 20 years it has gained exceptional relevance due to the explosion in data generation, mainly due to the irruption of the internet and the connected world.

Juan Benavente:  The term data science has evolved since its inception. Today, a data scientist is the person who is working at the highest level in data analysis, often associated with the building of machine learning or artificial intelligence algorithms for specific companies or sectors, such as predicting or optimising manufacturing in a plant.

The profession is evolving rapidly, and is likely to fragment in the coming years. We have seen the emergence of new roles such as data engineers or MLOps specialists. The important thing is that today any professional, regardless of their field, needs to work with data. There is no doubt that any position or company requires increasingly advanced data analysis. It doesn't matter if you are in marketing, sales, operations or at university. Anyone today is working with, manipulating and analysing data. If we also aspire to data science, which would be the highest level of expertise, we will be in a very beneficial position. But I would definitely recommend any professional to keep this on their radar.

  1. How did you get started in data science and what do you do to keep up to date? What strategies would you recommend for both beginners and more experienced profiles?

Alejandro Alija: My basic background is in physics, and I did my PhD in basic science. In fact, it could be said that any scientist, by definition, is a data scientist, because science is based on formulating hypotheses and proving them with experiments and theories. My relationship with data started early in academia. A turning point in my career was when I started working in the private sector, specifically in an environmental management company that measures and monitors air pollution. The environment is a field that is traditionally a major generator of data, especially as it is a regulated sector where administrations and private companies are obliged, for example, to record air pollution levels under certain conditions. I found historical series up to 20 years old that were available for me to analyse. From there my curiosity began and I specialised in concrete tools to analyse and understand what is happening in the world.

Juan Benavente: I can identify with what Alejandro said because I am not a computer scientist either. I trained in industrial engineering and although computer science is one of my interests, it was not my base. In contrast, nowadays, I do see that more specialists are being trained at the university level.  A data scientist today has manyskills on their back such as statistics, mathematics and the ability to understand everything that goes on in the industry. I have been acquiring this knowledge through practice. On how to keep up to date, I think that, in many cases, you can be in contact with companies that are innovating in this field. A lot can also be learned at industry or technology events. I started in the smart cities and have moved on to the industrial world to learn little by little.

Alejandro Alija:. To add another source to keep up to date. Apart from what Juan has said, I think it's important to identify what we call outsiders, the manufacturers of technologies, the market players.  They are a very useful source of information to stay up to date: identify their futures strategies and what they are betting on.

  1. If someone with little or no technical knowledge wants to learn data science, where do they start?

Juan Benavente:  In training, I have come across very different profiless: from people who have just graduated from university to profiles that have been trained in very different fields and find in data science an opportunity to transform themselves and dedicate themselves to this.  Thinking of someone who is just starting out, I think the best thing to do is put your knowledge into practice. In projects I have worked on, we defined the methodology in three phases: a first phase of more theoretical aspects, taking into account mathematics, programming and everything a data scientist needs to know; once you have those basics, the sooner you start working and practising those skills, the better. I believe that skill sharpens the wit and, both to keep up to date and to train yourself and acquire useful knowledge, the sooner you enter into a project, the better. And even more so in a world that is so frequently updated. In recent years, the emergence of the Generative AI has brought other opportunities.  There are also opportunities for new profiles who want to be trained . Even if you are not an expert in programming, you have tools that can help you with programming, and the same can happen in mathematics or statistics.

Alejandro Alija:. To complement what Juan says from a different perspective. I think it is worth highlighting the evolution of the data science profession.. I remember when that paper about "the sexiest profession in the world" became famous and went viral, but then things adjusted. The first settlers in the world of data science did not come so much from computer science or informatics. There were more outsiders: physicists, mathematicians, with a strong background in mathematics and physics, and even some engineers whose work and professional development meant that they ended up using many tools from the computer science field. Gradually, it has become more and more balanced. It is now a discipline that continues to have those two strands: people who come from the world of physics and mathematics towards the more basic data, and people who come with programming skills. Everyone knows what they have to balance in their toolbox. Thinking about a junior profile who is just starting out, I think a very important thing - and we see this when we teach - is programming skills. I would say that having programming skills is not just a plus, but a basic requirement for advancement in this profession. It is true that some people can do well without a lot of programming skills, but I would argue that a beginner needs to have those first programming skills with a basic toolset . We're talking about languages such as Python and R, which are the headline languages. You don't need to be a great coder, but you do need to have some basic knowledge to get started. Then, of course, specific training in the mathematical foundations of data science is crucial. The fundamental statistics and more advanced statistics are complements that, if present, will move a person along the data science learning curve much faster. Thirdly, I would say that specialisation in particular tools is important. Some people are more oriented towards data engineering, others towards the modelling world. Ideally, specialise in a few frameworks and use them together, as optimally as possible.

  1. In addition to teaching, you both work in technology companies. What technical certifications are most valued in the business sector and what open sources of knowledge do you recommend to prepare for them?

Juan Benavente: Personally, it's not what I look at most, but I think it can be relevant, especially for people who are starting out and need help in structuring their approach to the problem and understanding it. I recommend certifications of technologies that are in use in any company where you want to end up working. Especially from providers of cloud computing and widespread data analytics tools. These are certifications that I would recommend for someone who wants to approach this world and needs a structure to help them. When you don't have a knowledge base, it can be a bit confusing to understand where to start. Perhaps you should reinforce programming or mathematical knowledge first, but it can all seem a bit complicated. Where these certifications certainly help you is, in addition to reinforcing concepts, to ensure that you are moving well and know the typical ecosystem of tools you will be working with tomorrow. It is not just about theoretical concepts, but about knowing the ecosystems that you will encounter when you start working, whether you are starting your own company or working in an established company. It makes it much easier for you to get to know the typical ecosystem of tools. Call it Microsoft Computing, Amazon or other providers of such solutions. This will allow you to focus more quickly on the work itself, and less on all the tools that surround it. I believe that this type of certification is useful, especially for profiles that are approaching this world with enthusiasm. It will help them both to structure themselves and to land well in their professional destination. They are also likely to be valued in selection processes.

Alejandro Alija: If someone listens to us and wants more specific guidelines, it could be structured in blocks. There are a series of massive online courses that, for me, were a turning point. In my early days, I tried to enrol in several of these courses on platforms such as Coursera, edX, where even the technology manufacturers themselves design these courses. I believe that this kind of massive, self-service, online courses provide a good starting base. A second block would be the courses and certifications of the big technology providers, such as Microsoft, Amazon Web Services, Google and other platforms that are benchmarks in the world of data. These companies have the advantage that their learning paths are very well structured, which facilitates professional growth within their own ecosystems. Certifications from different suppliers can be combined. For a person who wants to go into this field, the path ranges from the simplest to the most advanced certifications, such as being a data solutions architect or a specialist in a specific data analytics service or product. These two learning blocks are available on the internet, most of them are open and free or close to free. Beyond knowledge, what is valued is certification, especially in companies looking for these professional profiles.

  1. In addition to theoretical training, practice is key, and one of the most interesting methods of learning is to replicate exercises step by step. In this sense, from datos.gob.es we offer didactic resources, many of them developed by you as experts in the project, can you tell us what these exercises consist of?. How are they approached?

Alejandro Alija: The approach we always took was designed for a broad audience, without complex prerequisites. We wanted any user of the portal to be able to replicate the exercises, although it is clear that the more knowledge you have, the more you can use it to your advantage. Exercises have a well-defined structure: a documentary section, usually a content post or a report describing what the exercise consists of, what materials are needed, what the objectives are and what it is intended to achieve. In addition, we accompany each exercise with two additional resources. The first resource is a code repository where we upload the necessary materials, with a brief description and the code of the exercise. It can be a Python notebook , a Jupyter Notebook or a simple script, where the technical content is. And then another fundamental element that we believe is important and that is aimed at facilitating the execution of the exercises. In data science and programming, non-specialist users often find it difficult to set up a working environment. A Python exercise, for example, requires having a programming environment installed, knowing the necessary libraries and making configurations that are trivial for professionals, but can be very complex for beginners. To mitigate this barrier, we publish most of our exercises on Google Colab, a wonderful and open tool. Google Colab is a web programming environment where the user only needs a browser to access it. Basically, Google provides us with a virtual computer where we can run our programmes and exercises without the need for special configurations. The important thing is that the exercise is ready to use and we always check it in this environment, which makes it much easier to learn for beginners or less technically experienced users.

Juan Benavente: Yes, we always take a user-oriented approach, step by step, trying to make it open and accessible. The aim is for anyone to be able to run an exercise without the need for complex configurations, focusing on topics as close to reality as possible. We often take advantage of open data published by entities such as the DGT or other bodies to make realistic analyses. We have developed very interesting exercises, such as energy market predictions, analysis of critical materials for batteries and electronics, which allow learning not only about technology, but also about the specific subject matter.. You can get down to work right away, not only to learn, but also to find out about the subject.

  1. In closing, we'd like you to offer a piece of advice that is more attitude-oriented than technical, what would you say to someone starting out in data science?

Alejandro Alija:  As for an attitude tip for someone starting out in data science, I suggest be brave. There is no need to worry about being unprepared, because in this field everything is to be done and anyone can contribute value. Data science is multi-faceted: there are professionals closer to the business world who can provide valuable insights, and others who are more technical and need to understand the context of each area. My advice is to be content with the resources available without panicking, because, although the path may seem complex, the opportunities are very high. As a technical tip, it is important to be sensitive to the development and use of data. The more understanding one has of this world, the smoother the approach to projects will be.

Juan Benavente: I endorse the advice to be brave and add a reflection on programming: many people find the theoretical concept attractive, but when they get to practice and see the complexity of programming, some are discouraged by lack of prior knowledge or different expectations. It is important to add the concepts of patience and perseverance. When you start in this field, you are faced with multiple areas that you need to master: programming, statistics, mathematics, and specific knowledge of the sector you will be working in, be it marketing, logistics or another field. The expectation of becoming an expert quickly is unrealistic. It is a profession that, although it can be started without fear and by collaborating with professionals, requires a journey and a learning process. You have to be consistent and patient, managing expectations appropriately. Most people who have been in this world for a long time agree that they have no regrets about going into data science. It is a very attractive profession where you can add significant value, with an important technological component. However, the path is not always straightforward. There will be complex projects, moments of frustration when analyses do not yield the expected results or when working with data proves more challenging than expected. But looking back, few professionals regret having invested time and effort in training and developing in this field. In summary, the key tips are: courage to start, perseverance in learning and development of programming skills.

calendar icon
Entrevista

Linknovate, the winning company of the 1st edition of Aporta Awards, is a software provider that tracks all the scientific production published on the internet, ranking the contents according to its own algorithm. Its search engine optimizes the time spent searching for information, facilitating contact between the academic and the business world.

We interviewed Manuel Noya and José López Veiga, Linknovate's associates, to tell us about their experience and opinion about the reuse of public information in Spain.

Linknovate is one of the biggest database of science and technology, with more than 20 million references, what is the potential of the data that you make available to citizens and companies?

At Linknovate we do not focus so much on the amount of documents, although we have a very good coverage from 2010 to 2018, but on its quality and usefulness. It is about understanding perfectly what organizations are behind those documents, who are their authors, their keywords, etc. There are many scientific databases but none puts the focus on cleaning and providing insights on these data, and broadening the perspective to what matters to companies: applications, related products... It is important to know what the specific activity of a company is, because it can be a potential partner or competitor.

What public information sources do you use to enrich your database?

Let's say that they can be divided into academic and industrial sources. In the academic world, we have scientific publications and conference proceedings with a similar coverage to Elsevier's Scopus, one of the most complete (and expensive) academic databases. On the other hand, in the industrial world, we obtain information from US and European trademark and patent registrations, news, corporate websites, etc. We could include a third type of sources, the academic-industrial mix, where we could find scholarships and European (FP7, H2020) and American (NSF, SBIR / STTR, DOE) projects.

In your opinion, what are the main sectors of activity that take advantage of Linknovate's open data potential? Who reuses the data and what is their objective?

Professionals from strategy, technology and innovation sectors, with the goal of making data-based decisions (business intelligence), the development of new products and searching for improvements (in products and processes). We target both industry professionals and researchers from technology centres and institutes.

How could we promote open scientific data in Spain?

 Promoting and rewarding those who enable open data, and above all, ensuring that quality data is appropriate. For example, it is important to promote "machine-readable" data, that is, readable by a machine without investing resources in cleaning and structuring the information. Many innovation data in Spain, for example, those related to companies that receive public funding, are public (you can see the redundancy if you try to explain it), but a large majority are non-readable PDFs, which can be processed without human intervention.

If an organization is motivated to share its data, with quality, due to an incentives system, in the long term, that and other organizations from the sector would confirm the benefits, as happens with open access software, reaching levels difficult to achieve by their own, without a community. Without incentives, this barrier is difficult to overcome, although the trend is clearly positive: there are more and more success stories.

What measures do you consider necessary to encourage national private sector companies to open, reuse and create innovative services based on open data?

In certain "structural" sectors it should be mandatory for companies to share their data as part of the services they provide (for example, concessionaires or bid winners), especially in sectors such as health, energy, public financing, etc. Always taking care of user’s privacy, of course.

In other cases, increasing public subsidies and incentives (tax incentives, innovation bonds, etc.) can be the necessary boost for a company to test and test what they can do with its Open Data. Promoting startups and SMEs that create value from these data makes the ecosystem grow and be sustainable: a company has more and more "answers" to the challenges that hide their data.

What are the next steps that Linknovate will follow as regards open data?

Now we are focusing on a new functionality: the ability to "follow" a topic (thanks to our ability to follow companies and / or research groups, and find news, patents, publications and almost any document related to innovation). A second part of this "alert system" is the ability to make a brief summary and visualize "insights", such as which new companies have searched for the topic or what new applications are more popular, among others.

Do you think initiatives such as the 2017 Aporta Awards can help boost the reuse of open data? What other initiatives of this kind do you think should be put into practice?

Of course. Awards, such as Aporta, help to give visibility to small companies that make up this ecosystem.

We believe that facilitating private contest and competitions is the key (where both the data and the need / challenge to solve come from the private company / industry). The public sector can and should be the one who opens the way, but this is only sustainable when the industry is aligned. The public sector should give an example of how to easy share open data, with quality and traceability. The European Data Portal and CORDIS are examples to follow.

calendar icon
Entrevista

Interview with Antonio F. Rodríguez Pascual, Assistant Deputy Director of CNIG.

  1.  What is the place of geospatial information within the general open data ecosystem?

To provide some figures, geographic information is inevitably present in seven (election results, national map, weather forecasts, pollutant emissions, location dataset, water quality, land ownership) of the thirteen essential datasets included in the Global Open Data Index of the Open Knowledge Foundation. It is also one of the types of information recognized as "highly important" by the Open Data Charter. According to the ASEDIE's report on the Infomediary Sector 2016, it is the third sector of the re-use market in terms of annual turnover, which represents a little more than 254 million euros, behind the Economic (20 %) and Market Research (27 %) industries. Without taking into account that the geospatial component is also present in two of the most relevant sectors: Market Research and Meteorological Information.

This type of information is considered one of the most important. It is necessary to reflect that, together with the statistical data, they are the best model of the real world that we have to make decisions, study problems, analyze phenomena, manage our resources, make plans and, in general, learn about the world.

It is a key epistemological tool. We always say that everything happens somewhere and it is true that, if we consider that indirect references, such as those defined by addresses, are also geographical, more than 80 % of the data managed by organizations are geographic data. Its importance has increased enormously with the almost permanent location of people through smart phones, the proliferation of sensors and georeferenced cameras and the development of the Internet of Things (IoT), which is already generating a sort of infosphere, a virtual space full of resources with coordinates that reflect and describe reality.

“More than 80 % of the data managed by organizations are geographic data.”

2. How has the European Directive INSPIRE helped the openness of geospatial data in Spain? At what point has this sector reached nowadays?

The implementation of the INSPIRE Directive in Europe and, especially, in Spain has contributed significantly to the openness of geospatial data. The European Commission has long recognized the synergy between the two activities. The fact is that in countries where the Inspire Directive is best established (Nordic countries, the Netherlands, the United Kingdom, Spain...) there are more open data and vice versa. This is probably due to the fact that the Inspire Directive has spread and promoted the idea that it is very positive to share geographic resources, both data and services, and this idea has spread to other sectors.

Especially in Spain, with the development of the Spatial Data Infrastructure of Spain (IDEE), whose motto is "If you share, you always gain more" (quote from "La buena suerte" by Álex Rovira and Trías de Bes, Uranus, 2004), the open data volume has increased significantly in recent years.

In a CNIG analysis made in 2016, from more than 100 Spanish pages of geographic information identified in which official geospatial data can be downloaded, 20 % of them offer partially open data (for non-commercial uses), 20 % closed data, 32 % offer completely open data and 36 % do not specify the use conditions, so we suspect that in a good number of cases the intention is also to publish open data and perhaps we can speak of at least 50 % of the offer of official geographic data.

Much progress has been made in the openness of geospatial data in Spain, but we are not completely satisfied, we have to progress even further. Antonio Gramsci said that "crises are those moments in which the old dies without the birth of the new" and I think that is the current situation in the area of Geomatics. There is an obsolete order based on desktop applications, the accumulation of data in silos as a capital to be profitable and monolithic power nuclei that do not just disappear and a new order that uses resources in the cloud, service-oriented architecture, open and networked organizations and new business models that have not been fully extended.

It is sharing with indirect benefits versus accumulating. A necessary change that points towards the information society, what the EU calls the Digital Single Market. The technicians are responsible for the technological revolutions to be produced at the desired rate and the collateral effects to be minimized.

According to the above mentioned ASEDIE’s report, the geospatial sector occupies the second place in the infomediary sector in terms of jobs generated, with 2,976 employees, 19 % of the sector, only behind the market studies, with 33 % with a light financial structure, since it has only 6 % (almost 18 million euros) of the subscribed capital by the sector and one of the lowest default risk.

And in the "Study of characterization of the infomediary sector of Spain" 2014, carried out by ONTSI, it is established that the geographic information sector is the most important within the re-use of public sector information, with 35 % of the companies, because urban planning information and meteorological forecast information are included.

3.  According to your experience, what barriers hinder the openness and re-use of geospatial information in the public sector? What solutions do you propose to eliminate such obstacles?

It is a difficult question to answer, there is a wide variety of barriers and difficulties. First of all, I believe that there is a natural resistance to change, so to speak metaphorically our environment is full of boatmen who want to charge tolls who cross the bridges. But it is necessary to recognize that the changes of mentality are not easy and that the administration is moving in a short time from being considered part of the government that directs the society life to an actor who manages resources of all the citizens and is at their service; the data-producing agencies are evolving to become web service providers, they open up to collaborate with other public and private organizations... there are many changes described very well, for example, by Enrique Dans in "Everything is Changing" and Pekka Himanen in his book published in 2002, "The Ethics of the Hacker and the Spirit of the Information Age."

Secondly, there is also a lack of training in 2.0 web technologies, ISO 19100 standards, OGC standards and the applications that implement them, which is not easy to overcome in a short time. They are new technologies that involve complex formats and languages (UML, XML, GML ...), very specialized models and a new way of working.

“The data producing bodies are evolving to become web service providers, they open up to collaborate with other public and private organizations…”

Thirdly, it should be mentioned that the public administration is a machinery with considerable inertia, bureaucratic procedures that are sometimes very heavy and employees difficult to recycle. Finally, it should be mentioned that greater political support and commitment at the highest level would always be desirable.

As regards solutions to overcome these barriers, apart from the obvious ones, such as training courses, awareness events, implementation of electronic procedures, etc., all of them aspects in which a great effort is being made and Spain is well positioned (our country leads the European classifications of open data and e-government after the Brexit), we would like to mention two lines of action that seem to be particularly useful:

- Openness to the web. I think it is very positive for an administration to be present on the web and interact with its users through blogs, mailing lists, social networks and surveys. This allows us to learn about their concerns and needs, empathize with them and better meet their needs.

- Strategic planning. The experience we have had in implementing the Strategic Plan of IGN and CNIG has been excellent. It guides all human and physical resources, in the same direction; it establishes in a clear and participative way the vision, mission and objectives of the organization; it integrates and motivates the staff and defines a set of indicators to measure the greater or lesser success of the institution in an objective manner, taking into account that maximizing the economic benefit is not the end of the administration, but rather having the best possible investment/social impact rate and these variables are better evaluated if there is a Strategic Plan.

In summary, a Strategic Plan gives meaning to all the activities of an organization, directing them in the same direction and integrating the staff. It is a tool always recommended, but even more in paradigm change processes.

4. Nowadays, IGN is committed to publishing its data under the CC BY 4.0 license. To what extent do you consider essential the use of this type of license in the promotion of the data re-use?

The use of implicit licenses, which implies tacit acceptance, and standard licenses is essential because they enable license interoperability. Indeed, the alternative of having an own license defined in a text, which has to be written in one of the official languages ​​in Spain, has the serious drawback that it compels users from countries whose official language is different (such as Germany, France, the United Kingdom, China, Japan, Korea and the Arab countries, for example) to tackle a series of difficult tasks if they want to use our data to georeference other information under another license, and create a value-added work with all legal guarantees: request a sworn translation into the language of the license text, an opinion that determines how the terms of the other license are mixed... While the Creative Commons 4.0 licenses are standard licenses internationally known and defined, which we already how to mix and hybridize with other type of licenses.

As for a license that only includes recognition, we believe that it is the most free and least restrictive license, since it focuses on the essential part of copyright, moral rights, without worrying about what the users may do with the data. In that sense, there are organizations that ask themselves what private companies are going to do with their data and if they are going to take advantage of the costs of generate them. The answer to this concern would be that everything they are going to do is positive for society: generating jobs, wealth and profit, paying taxes, disseminating data, giving them usefulness and meaning, spreading them, creating economic activity, and so on.

5.  How has the data download policy at the National Geographic Information Center (CNIG) evolved in recent years, and what has the change meant for the Center?

In a first stage, which lasted almost 20 years, from 1989 to 2008, the CNIG commercialized the ING’s geographical data according to a price order, with discounts for research of up to 90 % and in line with data policies prevailing throughout Europe.

In a second phase, which began with the Ministerial Order FOM / 956/2008 and lasted seven years, until the end of 2015, the most important data products of the IGN (Geodesic Vertices, Limit Lines, Gazetteers and Population Databases) were defined as National Reference Geographic Equipment (EGRN) and established as open data. The remainder was defined as free data for non-commercial uses. In this way, IGN became the first official cartography producer in Europe to partially open up its data. However, this policy has been generating a very considerable increasing overhead, since more and more complicated use cases have been arising in which it was not trivial to elucidate whether or not there was commercial use.

Finally, in December 2015, the Ministerial Order FOM/2807/2015 was adopted, which defines all IGN data products and geographical services as open with the only condition for recognition, which has placed us among the most advanced countries of the world in the field of re-use and publication of open resources. Predictably, the aforementioned Global Open Data Index 2016, which will be published soon, will place Spain in the first place in the international classification of open geographic data along with other 11 countries.

“In 2015, the Ministerial Order FOM/2807/2015 was adopted, which defines all IGN data products and geographical services as open”

6. What point has the CNIG reached in opening up its information? What steps will the institution follow in terms of open data?

At first we believed that open data was a matter of free information, then we realized that the terms of use were more important (the license) and now we are aware that actually publishing open data means making a continuous and constant effort to minimize the barriers that hinder their use, something that of course involves gratuity and an open license, but it also includes a number of details, as it is reflected in the open knowledge definition by the Open Knowledge Foundation. As an example, we can mention that the option to download products in one run has more impact on the download number and volume in the CNIG than the new use license. The most important is to minimize barriers of all kinds and to progress in that line, what we will dedicate from now on.

"Publishing open data means a continuous and steady effort to minimize barriers that hinder their use"

We should also, as National Contact Point of the implementation of the INSPIRE Directive in Spain, go in-depth in the implementation of this directive and, of course, continue to collaborate with red.es in the integration of the national open geographical data in the data portal of our country.

In that sense, our next probable steps will be:

-Using a CC BY 4.0 license, once all the bureaucratic and administrative processes are completed.

-Progressing in the publication of information in open formats, this aspect still has to advance a lot in the field of geographical information where quite often the most effective, comfortable and widespread formats such as shapefile and ECW, for example, are not yet open formats.

- Formally defining a PSI Re-use Plan for IGN digital geographic data.

- Disseminating and promoting the publication of open data and services.

7. From your point of view, what are the main reasons that should encourage public administrations in Spain to open up their data?

We have identified up to ten good reasons to open up the data generated by the Public Administrations:

1) A question arise once a public body generates data with public resources, derived from citizens' taxes, and in the exercise of its functions, aimed ultimately at serving the needs of society: To what extent is it entitled to limit access to the data produced to those same citizens by invoking copyright?

2) Open data are beneficial to a country's economy, as shown by successive studies and analysis. As early as 2000, the well-known PIRA report, commissioned by the European Commission to conduct an extensive economic analysis of the exploitation of the Public Sector Information (PSI) sector in Europe, made an extensive comparison between USA and Europe, two very similar economies in size at that time. It concluded that each euro invested in the production of public geographic data in USA was translated into an increase of the ISP sector of approximately €44, while in Europe that increase was only about €8. One of the reasons for this difference was the fact that federal geographic data in USA were open, while in Europe they were completely closed. All the studies that we have already done on the same subject have confirmed these conclusions.

3) Several international initiatives directly related to economic development promote the adoption of open data policies as a clear growth and social benefit factor and underline the importance of geographic data as one of the priority types of information in this regard. To mention only three of them, the Open Data Charter led by the G-8 in 2013, the International Open Data Charter, supported by the G20 and the United Nations in 2015, and the Agenda 2030 for sustainable development, recommends open data, especially the geographical ones.

4)  According to the experience of the CNIG during the last years, we can say that due to globalization, the democratization of cartography, the gratuity economy and the emergence of web services, revenues from commercial use licenses are less relevant. In 2015, total benefits accounted for only 8 % of the CNIG budget. It can be concluded in general terms that the commercialization of geographic data has long ceased to be a good business.

5) An open data policy allows an official geographic data producer to take advantage of the possibilities of Neocartography or Voluntary Geographic Information, because effectively how can voluntary citizens be motivated to collaborate in the collection and production of geographic data if, in the process, they must give up the ownership of some data that we then intend to sell them?

6) On the other hand, it allows the GeoInstitutes compete on equal and similar conditions, in terms of permitted uses and licenses, with other certainly popular actors who offer open geographical data and services or in quite open conditions, such as for example OpenStreetMap, which has an ODbL license similar to CC BY-SA, Geonames with its CC BY license or the Google Maps API and Carto services with their freemium business models, which offer free services to a certain number of queries if certain conditions are met.

7) If the official organisms that provide geographic data offer reference data, geographic data whose objective is to serve to georeference data of other subjects, it is logical that they promote their use in all type of applications and by all type of users, for which it seems clear that the best situation is spreading them as open data.

8)  The experience of the IGN during the years in which it commercialized geographic data was that this information policy was a very important brake for the research, even though discounts up to 90 % were established for R&D uses, given the permanent scarcity of resources for this purpose. Many research papers were unfeasible, so open data are actually a stimulus to research, even promoting it.

9) It is a real social demand that, since some years, appears within the GI sector in some media and social networks. We should mention the campaign for the release of geographic data maintained by "The Guardian" in the United Kingdom and the appearance in London in 2004 of the OpenStreetMap phenomenon, among other reasons, in reaction to the closed data policy maintained by the Ordnance Survey. Curiously, this is one of the few demands in which both the right-wing party and the neo-liberals agree, requesting a light public sector at the service of the private sector, which transfer them their data, and the left-wing ones, who conceives an administration at the service of the citizen who provides with all the data it manages.

10) Finally, it should be noted that open data always generate very relevant intangible returns, the use of data in research projects and prestigious international initiatives, the improvement of corporate image, increased presence in the network and social networks, synergies with other sectors of application, and so on. And sometimes indirect benefits are not as intangible, as when opening up the data encourages the development of applications on them that are then useful for the data producer in their production processes.

All this, together with other reasons, makes it generally advisable for a producer of official geographical data to adopt an open data policy, although we also understand that in some cases there are public bodies that cannot do so because it is prevented by the legal framework in force or by their political and administrative situation, which forces them to self-finance in whole or in part, which makes inevitable that they try to obtain an economic return for the use of their data. In these cases, it must be remembered that, given the high cost of generating geographic data, the return obtained from marketing, taking into account the market prices, covers only a small part.

In summary and as I have already mentioned, I believe that technicians and public officials must be committed to progress and have the moral obligation to push technological revolutions to make them profitable, trying to minimize the adverse side effects that may occur.

 

 

 

 

calendar icon
Entrevista

CRAs Aragón permite conocer la distribución y evolución de los Colegios Rurales Agrupados de Aragón. Esta iniciativa recibió el premio “Idea más novedosa” y “Votación de los Participantes” durante el Jacathon 2014.

Javier Rubio, en representación de su equipo (formado por Dani Latorre, Jesús Varón y Rafael Ramos) ha compartido su visión personal sobre las oportunidades y desafíos de sector open data en España. 

Como miembros de los equipos ganadores del evento Jacathon, conocéis de cerca el mundo de los datos abiertos y su reutilización. ¿Qué oportunidades ofrece el open data para el desarrollo empresarial en España y qué aspectos son necesarios mejorar para seguir avanzando en la apertura de la información?

Desde mi punto de vista, para avanzar es necesario un cambio de mentalidad a varios niveles. Para poder crear un impacto real que abra la puerta al desarrollo de nuevas oportunidades empresariales (actualmente difícil, debido en parte a la mala calidad, en forma y fondo, de los datos publicados), no es necesario que las entidades involucradas reinventen la rueda, basta con aprender de las lecciones de otros países:

-   Cambiar de mentalidad sobre la granularidad (por la privacidad). Hay una obsesión generalizada con no publicar ciertos datos llegando a un nivel de granularidad útil. Por ejemplo, es imposible encontrar datos sobre educación que sirvan para algo, con el pretexto de proteger la privacidad de los menores o no crear escuelas malas. Si observáramos más las experiencias en otros países, aprenderíamos que es factible publicar datos útiles granulares totalmente anónimos, sin invadir la privacidad, y lo que es más, podríamos analizar el impacto positivo que causa en las escuelas la necesidad de mejorar. Pero no conozco entidades involucradas en open data que se atrevan si quiera a plantear un estudio profundo de ciertas experiencias de otros países.

Ejemplos de datos que ahora mismo nadie se atreve a publicar y sí se está haciendo en otros países serían delitos geolocalizados a diario, resultados de notas granularizados por escuela, registros de visitas de edificios públicos (como ciudadano me interesa saber si un dirigente político se reunió con un empresario y al mes legislaron cambiando las normas del sector al que pertenecía el empresario, por ejemplo), registro de la propiedad, registro mercantil (ambos sin tasas)... Algún link a proyectos.

-    Ejemplo de proyecto de estudio de publicación de resultados escolares en Londres berglondon.comprojectsschooloscope

- Ejemplo de granularidad de crímenes en USA, que permitió un desarrollo empresarial aquí impensable por la falta de granularidad y periodicidad de publicación de datos similares trulia.comrealestateNeorkNeorkcrime.

-    Cambiar de mentalidad sobre las dimensiones de los datos a publicar. Los logros en open data a nivel local suelen situarse dentro de la dimensión de servicios al ciudadano y, en concreto, de transporte público (publicación de horarios en tiempo real de autobuses y otros medios de transporte, por ejemplo). Sin embargo los avances en las  otras dimensiones del open data, como la publicación de contratas públicas o la transparencia en general, son irrisorios. La variedad de formatos de publicación debido a la fragmentación es tal, que supone un esfuerzo titánico el plantear que alguien intente procesar tales datos y, por tanto, frena cualquier posible innovación empresarial alrededor.

-    Cambiar de mentalidad sobre cumplir la ley (y punto). Lo que he observado, y esto sí es más una percepción personal, es que ahora mismo todo el open data en España depende del esfuerzo de  funcionarios entregados, que entienden el impacto brutal que puede llegar a lograrse. Es decir, sin esas personas, no hay open data a ese nivel local (y la Ley de Transparencia no soluciona esto). Lo que es  todavía peor, cuando se realizan estos proyectos, otras personas involucradas de la administración, que llevan el tema de informática, sólo se preocupan de cumplir la ley (y punto), es decir, si la ley no les obliga a publicar la información de una manera estructurada y fácil de consumir, entonces qué más da. Parece que lo importante sea publicar  sets de datos, sin importar su calidad de contenido (y entre miles de sets inútiles, encontrar los interesantes es buscar una aguja en un pajar) ni su calidad de forma (como un pdf que es un escaneo en imagen y no texto), para así autocolgarse medallas vacías y salir en la foto. Así, es imposible un desarrollo empresarial sobre el mundo del open data.

 

Como usuarios y reutilizadores de los datos abiertos ¿Qué barreras os habéis encontrado a la hora de reutilizar la información pública y qué medidas creéis que deben adoptarse a medio plazo para impulsar la reutilización en España?

La barrera fundamental es, como ya he introducido en parte en la pregunta anterior, la mala calidad de los datos publicados, su poca utilidad en muchos casos (cantidad sobre calidad es la realidad actual), su fragmentación en docenas de microportales y APIs, y la desidia por cumplir los estándares existentes de estructuración y publicación de datos que facilitan su uso y consumo.

En cuanto a las medidas de impulso, creo que se ha desperdiciado una oportunidad de oro con la Ley de Transparencia (si bien aún se puede mejorar con las leyes autonómicas que están en proceso de publicación). La Ley de Transparencia no sólo no obliga a publicar los datos de forma adecuada, (que habría eliminado ese problema de raíz), sino que tampoco soluciona la fragmentación, permite excepciones para no publicar datos que resultan increíbles, obliga a un proceso farragoso para pedir nuevos datos al gobierno (DNI electrónico, más cantidad de formularios…).

Una buena medida sería prohibir la venta de datos públicos, obligando a que dichos datos que actualmente son de pago, se publiquen gratuitamente y con libre licencia de redistribución y uso. En concreto, hablo del Registro de la Propiedad, o el Registro Mercantil, o el Cendoj, controlados actualmente por lobbies de registradores que se lucran vendiendo datos que son libres a todos los niveles en otros países. Estos datos incluyen cosas como la información sobre las empresas públicas y privadas, o en el caso del Cendoj, todas las sentencias del Supremo (actualmente consultables con un buscador propio, pero de pago para su reutilización). Esto claramente frena la innovación y el desarrollo de iniciativas empresariales (por no hablar del derecho del ciudadano a conocer libremente dichos datos).

“La mala calidad de los datos publicados, su poca utilidad en muchos casos (…) y su fragmentación en docenas de microportales y APIs”.

 

Los hackathones se han convertido en la ocasión perfecta para que desarrolladores, emprendedores e infomediarios puedan conocer a representantes de entidades privadas y organismos públicos que apoyen sus proyectos. ¿Qué oportunidades os ofrece haber sido uno de los ganadores de Jacathon y cuáles serán vuestros siguientes pasos?

Lo bueno de los hackatones es que generan prototipos funcionales en muy breve tiempo, así que son una semilla de innovación, y por otro lado, obligan a una serie de profesionales de múltiples disciplinas a indagar y aprender sobre el open data español. Personalmente, el ganar el Jacathon me ha hecho abrir los ojos y me ha ofrecido por ejemplo la oportunidad de hablar con entidades relacionadas (como es el caso de esta entrevista), que tienen en su mano el mejorar la situación actual. Por desgracia un hackathon rara vez da lugar a continuar la idea, si no se teje una red alrededor de organizadores y participantes, que intente crear continuidad o generar proyectos paralelos posteriores.

No tengo claros los siguientes pasos porque realizar aplicaciones sobre el open data actual es francamente difícil y arriesgado si se trata de un proyecto empresarial, sin embargo sé que seguiré estudiando las fuentes de datos, proponiendo mejoras, siendo crítico (y ser crítico no significa no apreciar el esfuerzo que se ha hecho a múltiples niveles) y colaborando en la medida de mis posibilidades con el mundo del open data, que tanto impacto social puede llegar a causar si se potencia adecuadamente.

calendar icon