StatDCAT-AP, vocabulary for metadata of European statistical datasets
Fecha de la noticia: 16-03-2016

Unemployment statistics, demographic data, industrial production rates, prices, mortgage market... are just some examples of the statistical datasets published in open data catalogues at local, regional, national and international level. Nevertheless, currently there is a high degree of fragmentation in the way that these datasets are published, both at level of representation as well as exchange formats: from CSV, SDMX, and DDI files to RDF vocabulary of W3C, Data Cube.
This means a barrier that hinders the re-use of statistical data and its use by potential consumers. Moreover, the situation becomes more severe in the absence of standard mechanisms that describe completely the content of these datasets, enabling their discovery and automatic sharing.
To this end, StatDCAT-AP has been launched: a vocabulary to express in a structured way the metadata of statistical datasets which are currently published by the different agencies in the European Union.
StatDCAT -AP is proposed as an extension of the current standard vocabulary DCAT-AP, through the definition of new classes and properties that allow capture of the specific details of statistical data. In this regard, it needs to be taken into account that statistical datasets have a multidimensional structure where, on the one hand, there are numerical variables (or measurements): for example, number of persons or volume of credit (in €). And on the other hand, there are nominal variables (or dimensions) that disaggregate the value of numerical variables: geographical, temporal or specific dimensions of numeric variables. Therefore, agencies that publish are able, through StatDCAT-AP, to describe this multidimensional structure of the dataset as well as all aspects related to the publication itself of datasets already included in native form by DCAT-AP.
The first step taken by the European Union, through ISA 1.1 Program to improve semantic interoperability in European e-government systems, is the creation of an open working group for development of StatDCAT-A. This working group, formed mainly by European and national statistical agencies belonging to member states of the Union, was launched in early 2016 and has already defined a short and medium term roadmap to achieve specific goals for StatDCAT -AP.
The Eurostat and OECD portals (the European benchmark in publishing statistical data) will be taken as a starting point, and the SMES vocabulary (Euro-SDMX Metadata Structure) will be taken into account, the latter being a standard for description of statistical information metadata. Whereas SMES are textual descriptions geared to human use, StatDCAT-AP will allow RDF descriptions to be generated geared to automated search, sharing and transformation processes.
Today more than ever, statistical data are one of the main pillars of the open data universe given their great impact on every socioeconomic level. Public bodies do not only generate but also consume statistical data to develop and carry out their strategies; the private sector -consultancies, financial institutions, industries - uses this information to understand the market in which they operate and create value-added products and services; and citizens, thanks to statistical data, can understand and better evaluate policies in different environments and territories. For all these reasons, StatDCAT-AP will become an essential tool for statistical information to be published correctly and easily, enabling its re-use.