Entrevista

In this episode we will delve into the importance of three related categories of high-value datasets. These are Earth observation and environmental data, geospatial data and mobility data. To tell us about them, we have interviewed two experts in the field:

  • Paloma Abad Power, deputy director of the National Centre for Geographic Information (CNIG).
  • Rafael Martínez Cebolla, geographer of the Government of Aragón.

With them we have explored how these high-value datasets are transforming our environment, contributing to sustainable development and technological innovation.

Listen to the full podcast (only available in Spanish)

Summary / Transcript of the interview

1. What are high-value datasets and why are their important?

Paloma Abad Power: According to the regulation, high-value datasets are those that ensure highest socio-economic potential and, for this, they must be easy to find, i.e. they must be accessible, interoperable and usable. And what does this mean? That means that the datasets must have their descriptions, i.e. the online metadata, which report the statistics and their properties, and which can be easily downloaded or used.

In many cases, these data are often reference data, i.e. data that serve to generate other types of data, such as thematic data, or can generate added value.

Rafael Martínez Cebolla: They could be defined as those datasets that represent phenomena that are useful for decision making, for any public policy or for any action that a natural or legal person may undertake.

In this sense, there are already some directives, which are not so recent, such as the Water Framework Directive or the INSPIRE Directive, which motivated this need to provide shared data under standards that drive the sustainable development of our society.

2. These high-value data are defined by a European Directive and an Implementing Regulation which dictated six categories of high-value datasets. On this occasion we will focus on three of them: Earth observation and environmental data, geospatial data and mobility data. What do these three categories of data have in common and what specific datasets do they cover?

Paloma Abad Power: In my opinion, these data have in common the geographical component, i.e. they are data located on the ground and therefore serve to solve problems of different nature and linked to society.

Thus, for example, we have, with national coverage, the National Aerial Orthophotography Plan (PNOA), which are the aerial images, the System of Land Occupation Information (SIOSE), cadastral parcels, boundary lines, geographical names, roads, postal addresses, protected sites - which can be both environmental and also castles, i.e. historical heritage- etc. And these categories cover almost all the themes defined by the annexes of the INSPIRE directive.

Rafael Martínez Cebolla: It is necessary to know what is pure geographic information, with a direct geographic reference, as opposed to other types of phenomena that have indirect geographic references. In today's world, 90% of information can be located, either directly or indirectly. Today more than ever, geographic tagging is mandatory for any corporation that wants to implement a certain activity, be it social, cultural, environmental or economic: the implementation of renewable energies, where I am going to eat today, etc. These high-value datasets enhance these geographical references, especially of an indirect nature, which help us to make a decision.

3. Which agencies publish these high-value datasets? In other words, where could a user locate datasets in these categories?

Paloma Abad Power: It is necessary to highlight the role of the National Cartographic System, which is an action model in which the organisations of the NSA (National State Administration) and the autonomous communities participate. It is coordinating the co-production of many unique products, funded by these organisations.

These products are published through interoperable web services. They are published, in this case, by the National Center for Geographic Information (CNIG), which is also responsible for much of the metadata for these products.

They could be located through the Catalogues of the IDEE (Spatial Data Infrastructure of Spain) or the Official Catalogue of INSPIRE Data and Services, which is also included in datos.gob.es and the European Data Portal.

And who can publish? All bodies that have a legal mandate for a product classified under the Regulation. Examples: all the mapping bodies of the Autonomous Communities, the General Directorate of Cadastre, Historical Heritage, the National Statistics Institute, the Geological and Mining Institute (IGME), the Hydrographic Institute of the Navy, the Ministry of Agriculture, Fisheries and Food (MAPA), the Ministry for Ecological Transition and the Demographic Challenge, etc. There are a multitude of organisations and many of them, as I have mentioned, participate in the National Cartographic System, provide the data and generate a single service for the citizen.

Rafael Martínez Cebolla: The National Cartographic System defines very well the degree of competences assumed by the administrations. In other words, the public administration at all levels provides official data, assisted by private enterprise, sometimes through public procurement.

The General State Administration goes up to scales of 1:25,000 in the case of the National Geographic Institute (IGN) and then the distribution of competencies for the rest of the scales is for the autonomous or local administrations. In addition, there are a number of actors, such as hydrographic confederations, state departments or the Cadastre, which have under their competences the legal obligation to generate these datasets.

For me it is an example of how it should be distributed, although it is true that it is then necessary to coordinate very well, through collegiate bodies, so that the cartographic production is well integrated.

Paloma Abad Power: There are also collaborative projects, such as, for example, a citizen map, technically known as an X, Y, Z map, which consists of capturing the mapping of all organisations at national and local level. That is, from small scales 1:1,000,000 or 1:50,000,000 to very large scales, such as 1:1000, to provide the citizen with a single multi-scale map that can be served through interoperable and standardised web services.

4. Do you have any other examples of direct application of this type of data?

Rafael Martínez Cebolla:  A clear example was seen with the pandemic, with the mobility data published by the National Institute of Statistics. These were very useful data for the administration, for decision making, and from which we have to learn much more for the management of future pandemics and crises, including economic crises. We need to learn and develop our early warning systems.

I believe that this is the line of work: data that is useful for the general public. That is why I say that mobility has been a clear example, because it was the citizen himself who was informing the administration about how he was moving.

Paloma Abad Power: I am going to contribute some data. For example, according to statistics from the National Cartographic System services, the most demanded data are aerial images and digital terrain models. In 2022 there were 8 million requests and in 2023 there were 19 million requests for orthoimages alone.

Rafael Martínez Cebolla: I would like to add that this increase is also because things are being done well. On the one hand, discovery systems are improved. My general feeling is that there are many successful example projects, both from the administration itself and from companies that need this basic information to generate their products.

There was an application that was generated very quickly with de-escalation - you went to a website and it told you how far you could walk through your municipality - because people wanted to get out and walk. This example arises from spatial data that have moved out of the public administration. I believe that this is the importance of successful examples, which come from people who see a compelling need.

5. And how do you incentivise such re-use?

Rafael Martínez Cebolla: I have countless examples. Incentivisation also involves promotion and marketing, something that has sometimes failed us in the public administration. You stick to certain competences and it seems that just putting it on a website is enough. And that is not all.

We are incentivising re-use in two ways. On the one hand, internally, within the administration itself, teaching them that geographic information is useful for planning and evaluating public policies. And I give you the example of the Public Health Atlas of the Government of Aragon, awarded by an Iberian society of epidemiology the year before the pandemic. It was useful for them to know what the health of the Aragonese was like and what preventive measures they had to take.

As for the external incentives, in the case of the Geographic Institute of Aragon, it was seen that the profile entering the geoportal was very technical. The formats used were also very technical, which meant that the general public was not reached. To solve this problem, we promoted portals such as the IDE didactica, a portal for teaching geography, which reaches any citizen who wants to learn about the territory of Aragon.

Paloma Abad Power: I would like to highlight the economic benefit of this, as was shown, for example, in the economic study carried out by the National Centre for Graphic Information with the University of Leuven to measure the economic benefit of the Spatial Data Infrastructure of Spain. It measure the benefit of private companies using free and open services, rather than using, for example, Google Maps or other non-open sources..

Rafael Martínez Cebolla: For better and for worse, because the quality of the official data sometimes we wish it were better. Both Paloma in the General State Administration and I in the regional administration sometimes know that there are official data where more money needs to be invested so that the quality of the data would be better and could be reusable.

But it is true that these studies are key to know in which dimension high-value datasets move. That is to say, having studies that report on the real benefit of having a spatial data infrastructure at state or regional level is, for me, key for two things: for the citizen to understand its importance and, above all, for the politician who arrives every N years to understand the evolution of these platforms and the revolution in geospatial information that we have experienced in the last 20 years.

6. The Geographic Institute of Aragon has also produced a report on the advantages of reusing this type of data, is that right? 

Rafael Martínez Cebolla: Yes, it was published earlier this year. We have been doing this report internally for three or four years, because we knew we were going to make the leap to a spatial knowledge infrastructure and we wanted to see the impact of implementing a knowledge graph within the data infrastructure. The Geographic Institute of Aragon has made an effort in recent years to analyse the economic benefit of having this infrastructure available for the citizens themselves, not for the administration. In other words, how much money Aragonese citizens save in their taxes by having this infrastructure. Today we know that having a geographic information platform saves approximately 2 million euros a year for the citizens of Aragon.

I would like to see the report for the next January or February, because I think the leap will be significant. The knowledge graph was implemented in April last year and this gap will be felt in the year ahead. We have noticed a significant increase in requests, both for viewing and downloading.

Basically from one year to the next, we have almost doubled both the number of accesses and downloads. This affects the technological component: you have to redesign it. More people are discovering you, more people are accessing your data and, therefore, you have to dedicate more investment to the technological component, because it is being the bottleneck.

7. What do you see as the challenges to be faced in the coming years?

Paloma Abad Power: In my opinion, the first challenge is to get to know the user in order to provide a better service. The technical user, the university students, the users on the street, etc. We are thinking of doing a survey when the user is going to use our geographic information. But of course, such surveys sometimes slow down the use of geographic information. That is the great challenge: to know the user in order to make services more user-friendly, applications, etc. and to know how to get to what they want and give it to them better.

There is also another technical challenge. When the spatial infrastructures began, the technical level was very high, you had to know what a visualisation service was, the metadata, know the parameters, etc. This has to be eliminated, the user can simply say I want, for example, to consult and visualise the length of the Ebro river, in a more user-friendly way. Or for example the word LiDAR, which was the Italian digital model with high accuracy. All these terms need to be made much more user-friendly.

Rafael Martínez Cebolla: Above all, let them be discovered. My perception is that we must continue to promote the discovery of spatial data without having to explain to the untrained user, or even to some technicians, that we must have a data, a metadata, a service..... No, no. Basically it is that generalist search engines know how to find high-value datasets without knowing that there is such a thing as spatial data infrastructure.

It is a matter of publishing the data under friendly standards, under accessible versions and, above all, publishing them in permanent URIs, which are not going to change. In other words, the data will improve in quality, but will never change.

And above all, from a technical point of view, both spatial data infrastructures and geoportals and knowledge infrastructures have to ensure that high-value information nodes are related to each other from a semantic and geographical point of view. I understand that knowledge networks will help in this regard. In other words, mobility has to be related to the observation of the territory, to public health data or to statistical data, which also have a geographical component. This geographical semantic relationship is key for me.

Interview clips

1. What are high-value datasets and why are their important?

2. Where can a user locate geographic data?

3. How is the reuse of data with a geographic component being encouraged?

calendar icon
Entrevista

Data has great power to transform society. Its capacity to generate knowledge, drive innovation and empower citizens is undeniable. In particular, open government data is a resource with which to address major environmental, social and economic challenges from an innovative perspective.

In this sense, public administrations, including the autonomous communities, are organising competitions to promote the data culture. To tell us about these initiatives we have interviewed:

  • Sonia Gómez Martín, head of the transparency and information Re-use Service of the Government of Castilla y León.
  • Imanol Argüeso Epelde, head of projects at the Basque Government's Directorate for Citizen Services and Digital Services.

Listen to the full podcast (only available in Spanish)

Summary / Transcript of the interview

1. To begin with, you can briefly present your data initiatives. What kind of data and contents can we find in the Open Data Euskadi platform? And on the Junta de Castilla y León's Open Data platform?

Imanol Argüeso Epelde: In OpenData Euskadi, the Basque Government's open data initiative, there is a catalogue of around 12,000 datasets from the Basque Government, the three provincial councils of the Autonomous Community of Euskadi - which are Vizcaya, Guipúzcoa and Álava - and the three capitals of these territories. By means of a federation system, all your datasets are displayed in the catalogue.

In addition, there is a community section, where we show news that we consider relevant to the world of open data. We also have a section for competitions and examples of products that have been made with our data.

Sonia Gómez Martín: All of this is similar to what can be found on the Junta de Castilla y León's open data platform. In our case, the open data catalogue only includes data from the Autonomous Community administration itself, not from the different provincial councils or provincial capitals.

In addition to the data catalogue, we have a visualisations portal, where we accommodate data with a large volume of information and where visualisations and API queries can be made. These data are thematic: there are up to 21 categories such as health, public sector, culture, leisure, rural environment and fisheries, and so on.

2. What activities are you carrying out to promote the re-use of this data?

Sonia Gómez Martín: The main activity in recent years has been the organisation and the annual Open Data Competition, through which we encourage reusers to use at least some dataset from our catalogue to create products, services and teaching resources.

There are also a number of other internal activities. For example, courses are run with our training school for internal staff of the Junta, so that they know the importance of reusing information generated within the public sector and making open data available to citizens and businesses.

In addition, there is a news section on the portal and we also receive requests for the dissemination of applications or the opening of data.

Imanol Argüeso Epelde: We also give courses within the Basque Government and to other administrations. For example, this year we have given one to the Provincial Council of Alava. We also have an initiative called Aula Open Data at the University of the Basque Country, located at the School of Engineering in Bilbao. It is a business classroom designed for students to use open data, make applications, visualisations and services derived from the data, and learn about this tool for their future professional activity.

We also participate in any event, conference, talk, etc. When an event related to open data comes up, we usually participate.

3. You have already introduced us to the data contests you organise. Can you tell us a bit more about each of them?

Imanol Argüeso Epelde: In the case of the Basque Country, there are two calls: one for applications and the other for ideas. The registration period for the 5th edition of the two calls is now open, and ends on 10 October.

In the case of the applications competition, any product derived from the open data of any of the catalogues of the Basque Government, provincial councils and the three capitals of the Basque Country will be awarded. It is mandatory to use some dataset from these catalogues. All that is requested is a URL with the service or product to be developed and a short document describing the project.

In the case of the call for ideas, a document explaining an idea for an open data product is needed.

We distributed around €34,000 in prizes in different categories.

It is also important to note that, although it is organised by the Basque Government as such, the three provincial councils and the three town councils of the Basque capitals collaborate: they participate in the jury, help us with promotion, etc.

Sonia Gómez Martín: In our case, it is a single call, but there are four categories. A category of ideas is also established, similar to that of the Basque Government. Another one for products and services that is also similar to the one Imanol mentioned: we are looking for an application or URL where a website is developed that uses some dataset from our catalogue. And then there are two additional categories. One for teaching resources, which seeks to encourage the creation of new and innovative open teaching resources using datasets from our portal to support classroom teaching. And another category of data journalism, which seeks to reward journalistic pieces published or updated in a relevant way in any written or audiovisual medium, where the information takes into account open datasets from our catalogue.

We give away €12,000 in prizes in total. And well, right now we have the 8th edition open until 23 September 2024.

4. What are the requirements for participation?

Sonia Gómez Martín: Entries must not have previously been awarded prizes in other competitions. In all categories it is necessary to use at least one data source from the catalogue of the Junta de Castilla y León's open data portal. And the same person can submit several nominations in different categories.

In the case of data journalism, it is sought to have been published as of the last day on which nominations could be submitted the previous year, which in this case is 3 October 2023.

In the case of the products and services category, there are awards for students, where the applicant must be a student enrolled in the 2023-2024 or 2024-2025 academic year.

Imanol Argüeso Epelde: The case of the Basque Country is similar. It is requested that some dataset from the public open data catalogues we have discussed be used: from the three provincial councils, the three capitals or the Basque Government. In the case of applications, it is also necessary to develop some kind of application, visualisation or website based on this open data.

Both competitions are open to any private individual, professional or even any company.

I would like to take this opportunity to encourage people. The deadline is 10 October and anyone interested still has time to submit an idea or generate a product.

5. And what has been the impact of these competitions? can you give us some examples of solutions, ideas or products that have been submitted to the competition?

Sonia Gómez Martín: There are very interesting things, especially in what the students bring to the table. In the editions in which I have been part of the jury, I have seen, for example, an application, a website, which included the entire offer of vocational training in Castilla y León. Also an analysis of energy data which I found very interesting. In addition, some institutes have submitted and won awards for initiatives based on agricultural information catalogues. They made a small analysis of the peculiarities of our territory.

Imanol Argüeso Epelde: The truth is that most of the products that are generated are not still active. But there are some very interesting examples that still work today. To cite an example, the last edition presented a website called Openslot, which offers information on gaming and recreational machines in the Basque Country: manufacturers, machine models and makes predictions. It is a very sector-specific application.

Another example: last year, a Telegram group that relied on open data to provide information on which time slots are best for energy consumption was the winner and is still active. There are some products that last over time and others that are developed only for competitions.

6. What advice would you give to other public bodies wishing to launch such initiatives?

Imanol Argüeso Epelde: Above all, I would stress the importance of dissemination, of promoting the competitions in training centres, in universities related to information technologies.

It has also worked for us to include a voting system so that people can vote on the nominations. And this year we have included different categories by theme, in the case of ideas. In the case of applications, it is assessed whether access to the data is via an API or the SPARQL point. What we want to do is diversify and make more people eligible for the prize.

Sonia Gómez Martín: I would like to insist on what Imanol said about promotion. It is very important to make universities aware of the competitions and to encourage them to participate. You can also publicise it on social networks, on your portal datos.gob.es, etc. Everything little by little is helping to make them known and to increase the number of participants.

7. These competitions are a window to listen to the needs of re-users, have you taken any concrete action as a result of this feedback?

Sonia Gómez Martín: We have, on the open data portal itself, a section where we receive requests from re-users on what types of open data they would like to have. We receive them and we pass them on, but it is true that internally we sometimes have problems for the data they demand to materialise. It is not always easy for the management centre on which these data depend to convert them into open or even structured data formats.

On social media we also have an account on X, @transparencia, where we also receive requests, evaluate them and study them.

Imanol Argüeso Epelde: Yes, it is true. Normally, open data areas are often mere transmitters and it is sometimes difficult to materialise requests. I think that one of the great advantages of the competitions is that, internally, it is a very interesting source of information to listen to the reusers, to see what problems they have, what tools they use, what characteristics they have... and this allows us to focus our efforts.

Following this source of information, we have opened up certain datasets. The example I mentioned, Openslot, uses data that was not open and that we opened as a result of this participant. We have also developed several REST APIs based on the most demanded data: meteorological data, air quality, water quality, etc.

Interview clips

1. What does the Euskadi open data contest consist of? (only available in Spanish)

2. What is the Castilla y León open data contest? (only available in Spanish)

calendar icon