Interview with Alberto Gonzalez Yanes, Head of the Economic Statistics Service of the Canary Islands Statistics Institute (ISTAC).

Fecha: 19-10-2022

Nombre: Alberto González Yanes

Sector: Science and technology

Organismo, Institución o Empresa: The Canary Islands Statistics Institute

País: Spain

Imagen entrevista Alberto Gonzalez, ISTAC

The Canary Islands Statistics Institute (ISTAC) is the central body of the autonomous statistical system and the official research center of the Government of the Canary Islands. It is in charge of providing statistical information of interest related to the autonomous community, taking into account the singularities of the territory. It also coordinates public statistical activity, facilitating its promotion and management.

Alberto González Yanes, Head of the Economic Statistics Service of the ISTAC, has talked to datos.gob.es to tell us how they work and what is the impact of the data they treasure.  

Full interview:

1. Statistical data are considered high-value data by the EU. Moreover, the UN itself has highlighted the importance of having initiatives that generate data focused on local realities. Why do you consider this type of data to be so valuable? What is its potential impact?

Regarding the importance of statistical data, we have to take into account a not very well known issue: they generate duties and rights for citizens, but also for States. For example, we are seeing it right now with the CPI, which entails the duty to pay more rent and, in turn, the right to obtain a higher salary. 

In addition, they are instruments with which the States are equipped to be able to know the reality in an objective and independent way. It is important to highlight this role of public statistics as opposed to any different role of other public data, which do not have the same value from the point of view that concerns us. It is not in vain that public statistics appear in the entire constitutional block, from the Magna Carta itself (with its own article), to the different Statutes of Autonomy. Our legislation establishes independent bodies for the production of statistical data and does so by means of a model that could be considered federal, in which there are at least eighteen systems, one for state purposes and seventeen for autonomous and local purposes.
 
Decentralization is a very important element because it allows support for data-based decision-making in each territory, while state production only reaches, in the best of cases, a provincial scale. If we want a society that generates rights and duties at the autonomous, provincial, island, municipal, and even submunicipal levels, it is essential to support them with reliable local data.
 
With respect to their impact, we have a current and very significant example. The ISTAC has just published the Registered Active Population Statistics (EPA-Reg), which produces data on the active population at the submunicipal level. This represents a qualitative leap with respect to the EPA, which only reaches a provincial or, at most, insular level for some indicators required by Eurostat and prepared by the Institute under agreement with the INE.
 
Decentralization is a very important element because it allows support for data-based decision making in each territory.
 
In EPA-Reg, we construct indicators that approximate the concepts offered by the International Labor Organization when measuring the population and its relationship with economic activity. In this way, information is provided for each neighborhood and each town in the Archipelago. And, undoubtedly, the Local Development Agents, the Employment Councils and the Canary Islands Employment Service itself need these data to make decisions and carry out a better intervention on a small scale.
 
Decentralization is a very important element because it allows support for data-driven decision making in each territory.

2. The ISTAC has been generating a large amount of statistical data for more than 30 years. How was the process of incorporating the open data philosophy into your daily activity? What challenges did you encounter and how did you overcome them?
Open data has two key elements: on the one hand, making data publicly available and, on the other hand, how to present it in an open format, easily reusable by third parties.

Regarding the first aspect, openness of data is in the genesis of public statistics at the international level. All statistical legislation contemplates the obligation to publish data: the European Statistics Regulation, Law 12/89 on the Public Statistical Function or Law 1/91 on Public Statistics of the Autonomous Community of the Canary Islands, as far as the ISTAC is concerned. But disseminating data is much more than an obligation, it is the reason to exist of statistics. Moreover, it must be done in an egalitarian way, planned in time, with a previously known calendar to guarantee transparency and confidence to the citizens, as well as the security of being able to use these results for decision making.

Another different issue is the formats in which these data were published, which were often closed: PDF, Excel and many others. It is true that statistical offices are used to working with a huge amount of data and this necessarily implies the need to metadocument them in order to try to manage them properly. But good management does not imply that all this information has to be open.

We want to facilitate the information we have, so semantic standardization is born from the beginning, all the data sets are well structured so that they can be reusable.
 
We have to keep in mind that the first reusers of the data we published were and still are ourselves. Many times, government departments, the ISTAC itself, suffered from format changes or, for example, from having to rescue information from a non-editable pdf, something unsustainable. So, in practice, and even from an egocentric perspective, it became clear to us the need to have open formats for better data management.
We want to facilitate the information we have, so semantic standardization is born from the beginning, all data sets are well structured so that they can be reusable.
 
And, while we were in that process, open data came to the international limelight, which fitted very well with the moment in which the ISTAC was, so it was decided to move forward on that path. Just as we needed to have good formats and an optimal organization of all the information, we had to offer the same advantages to the end users, to the citizens. Therefore, from the very beginning, in the redefinition of the Institute's entire dissemination strategy, this need was raised, which linked perfectly with the whole open data culture.
 
And so, around 2008, we proposed a whole public data management system that would allow good metadata management, which has led us, for example, to have 85 metadata for each dataset, of which only a part is disseminated externally. Similarly, at that stage we began to structurally metadata document the datasets, with a first semantic approach (classifications, codes, concepts, etc.). That was the genesis and, over time, through different projects we have achieved that the culture of data openness is incorporated from the design, from the moment the statistical product is being thought up until it is disseminated. We want to facilitate the information we have, so semantic standardization is born from the beginning, all data sets are well structured so that they can be reusable and we are always thinking about how the dissemination will be to facilitate, not only accessibility, but also usability by third parties.The main challenges, initially (2005 - 2006), were internal and technological. We did not have an organizational culture of data management or standardized metadocumentation. Nor were there enough standards or applications on the market to enable us to address the problem. So what we did, through various projects, several of them with European funding, was to set up a whole data infrastructure with different technologies. During this phase, we equipped ourselves with internal standards, adopting international standards such as SDMX (Statistical Data and Metadata eXchange) or DCAT-AP, among others.  All in all, we were cooking and building the path we had to follow, which has led us to currently have a very powerful data management system.

3. What is your open data governance process like and what kind of profiles do you have in the team?

We are something like a data factory, which goes through its entire life cycle: capture, processing, i.e., cleaning, cleansing, imputation, integration, georeferencing, generation of information at microdata scale, anonymization, generation of data of all kinds (not only aggregated data in cubes, but also in dashboards, in dashboards, in geographic information...). Therefore, we cover the entire business spectrum within data governance which, as I said before, from start to finish is crossed by the culture of data openness. We know that what we are producing is to make it available to the citizenry. So we have many types of profiles within the organization:

  • Surveyors, whose work, the work of all the staff who are in the field, is very important although we don't always emphasize it.
  • Traditional profiles of statistical technicians.
  • Those that have been incorporated in recent times, linked to data architecture, data engineering, data science and specialists in geographic information systems.
  • And, recently, we are incorporating professionals linked to data communication because we have a huge production and dissemination of data, but we want to advance in a fundamental aspect, which is dissemination. The public has the right not only to access, but also to understand the information we produce, so we need to do important work in this regard.

4. The ISTAC is making a strong commitment to facilitate automated access to data through APIs. What impact is this strategy having in terms of data reuse? Do you consider that access via API in combination with downloading data files is the way forward for publishers of statistical data or is one of the two alternatives the preferred one for the type of user who consumes this category of data?

Regarding the use of APIs, from the very beginning, since we started planning our data technology structure, back in 2008, even before they were contemplated in the current Reuse Directive, we decided that all our information would be supported by an API ecosystem. And so it is, we have about eight public APIs, with different methods, and we are going to keep expanding them. We believe in this kind of strategy so much that our own applications are users of our APIs. That means that we don't put parallel APIs to the systems to be consumed by the public, but our systems are also consumers of those APIs. This is an important element, because since you are the first reuser of your APIs, it allows you to discover the limitations and problems of all kinds that may arise when disseminating data through them.

Regarding the impact, we found that it is not enough to make APIs available to the public. Many times some of the people who access work on certain types of data analytics applications such as Tableau, PowerBI, QGIS, QLIK or other commercial or non-commercial ones. So we considered, once we had already made the APIs available, to include connectors for all these types of applications that would facilitate the translation of data to these data analytics systems.

The impact of this instrumentation has been quite powerful because it has made it easier for administrations and private companies to reuse the information published via APIs. Thus, we can find many dashboards all over the Canary Islands that are using these connectors, especially in the tourism sector. As for local entities, for example, the Socioeconomic Observatory of the City Council of Santa Cruz de Tenerife has a Tableau dashboard that is updated with our APIs, with all the municipal indicators. Similarly, there are different experiences in the private sector. We believe that the ecosystem of APIs plus connectors, that tandem, is having an important impact to democratize access to ISTAC data by third parties, mainly for the public sector itself.

Una vez que ya habíamos puesto a disposición las API, nos planteamos incluir conectores para todo ese tipo de aplicaciones que facilitasen la traslación de los datos a esos sistemas de analítica de datos.

Once we had already made the APIs available, we considered including connectors for all these types of applications to facilitate the translation of data to these data analytics systems.

In general, we did not enter into the dilemma of whether downloading files is better than using APIs. For ISTAC, the download itself is an API method, since any dataset can be hot queried or requested for download. The question is not so much the method but the logic of need. For example, when we have the microdata files of a survey, does it make sense to serve it by API? It does, but the logical thing is not to consume it by this method but as a download, to upload it later to the environments in which the analysis of this microdata will be carried out. In this regard, we have in our roadmap to incorporate bulk systems, massive systems of automatic download of all the datasets linked to a given request.

5. In addition to the API, your open data platform has several types of query tools that facilitate access to and use of the data. What can you tell us about them?   

As we mentioned earlier, our ultimate goal is to disseminate data. But this mission does not end when we include in a data catalog all the datasets we have, but when we provide citizens with a first approach of simple consultation to these results. In this sense, we have different viewers that make it possible. We have a general viewer that allows us to explore any type of dataset and more specific viewers: the ODS indicators, the Electoral Information System, the Statistical Atlas of the Canary Islands or the Municipal Data Sheets. For us it was and still is important to have a set of generalist or specialized tools for the population that is not a regular user of data analytics systems. These are simple tools, but more than simple table viewers, with which they can access a dataset and consult the most important findings that arise from that dataset.

6. Do you carry out any kind of monitoring of data use, and have you identified any specific use cases?

In planning the new website of the ISTAC there is a whole strategy for monitoring data usage, at least at three levels:

  • That of the use of our APIs, which are not currently monitored. This would be the first element because, as we have already pointed out, everything that is going to be consumed will be through APIs.
  • Traditional web analytics, consulting each of the pages.
  • Citizen interaction with our applications to make hot usability analysis, so that we can distinguish how citizens use the ISTAC system, and from there, make decisions for improvement in that area or implement a system of recommendations.

7. What are the future plans of the ISTAC in terms of open data and reuse?

Regarding future plans in this area we have several lines of work. A first, very important for us is the cooperation with the data ecosystem of the Government of the Canary Islands. In this way, a data governance model is being configured, which is of a federated and cooperative nature, in which four Departments of the Government of the Canary Islands participate: the Directorate General of Modernization and Quality of Services, the Directorate General of Telecommunications and New Technologies, the Directorate General of Transparency and Citizen Participation, and the ISTAC.

In the area of open data, the co-participation with the General Directorates of Transparency and Telecommunications is fundamental. This has led us to accompany them in the semantic standardization of data for the opening of the Canary Islands portal. But the process goes further, we are initiating the assistance and the internal production of the whole API ecosystem of semantic standardization, so that the data sets managed within the Government of the Canary Islands use the same, in compliance with the National Interoperability Scheme, which in its article 10 establishes that the classifications and concepts used by the administrative projects have as a reference the concepts and classifications provided by the statistical system. For us it is important because it implies working already, from the origin, a good management of the semantic quality of the data for its later opening. It is a powerful plan for the future to try to have a better data quality.

We are working intensively on improving the website, on generating a new one aimed at facilitating the understanding of statistical information by the public.

We are working hard to improve the website, to generate a new one aimed at facilitating the understanding of statistical information by citizens.

We are also working on other noteworthy elements: on the one hand, we are going to put in the Open Data Catalog all the classifications and concepts we use, in a reusable format, so that anyone can benefit from this possibility. And on the other hand, we are going to open new APIs, including a very important one, which is the one we use for statistical georeferencing, so that any information can be georeferenced by third parties with the quality that the ISTAC has.

At the same time, we are working hard to improve the website, to generate a new one aimed at facilitating the understanding of statistical information by citizens, beyond disseminating a data catalog as we have done so far. Thus, for example, we will include problems or debates that are arising on a public scale and the corresponding discoveries based on data that we can provide. For example, it is currently being debated whether or not we have overpopulation in the Canary Islands. There, public statistics have a lot to say, but it must be presented in such a way that it is easily understandable. To this end, we are making an important investment, both in web technology and in the basis for the clear communication of statistical information.