The future new version of the Technical Standard for Interoperability of Public Sector Information Resources (NTI-RISP) incorporates DCAT-AP-ES as a reference model for the description of data sets and services. This is a key step towards greater interoperability, quality and alignment with European data standards.
This guide aims to help you migrate to this new model. It is aimed at technical managers and managers of public data catalogs who, without advanced experience in semantics or metadata models, need to update their RDF catalog to ensure its compliance with DCAT-AP-ES. In addition, the guidelines in the document are also applicable for migration from other RDF-based metadata models, such as local profiles, DCAT, DCAT-AP or sectoral adaptations, as the fundamental principles and verifications are common.
Why migrate to DCAT-AP-ES?
Since 2013, the Technical Standard for the Interoperability of Public Sector Information Resources has been the regulatory framework in Spain for the management and openness of public data. In line with the European and Spanish objectives of promoting the data economy, the standard has been updated in order to promote the large-scale exchange of information in distributed and federated environments.
This update, which at the time of publication of the guide is in the administrative process, incorporates a new metadata model aligned with the most recent European standards: DCAT-AP-ES. These standards facilitate the homogeneous description of the reusable data sets and information resources made available to the public. DCAT-AP-ES adopts the guidelines of the European metadata exchange scheme DCAT-AP (Data Catalog Vocabulary – Aplication Profile), thus promoting interoperability between national and European catalogues.
The advantages of adopting DCAT-AP-ES can be summarised as follows:
- Semantic and technical interoperability: ensures that different catalogs can understand each other automatically.
- Regulatory alignment: it responds to the new requirements provided for in the NTI-RISP and aligns the catalogue with Directive (EU) 2019/1024 on open data and the re-use of public sector information and Implementing Regulation (EU) 2023/138 establishing a list of specific High Value Datasets or HVD), facilitating the publication of HVDs and associated data services.
- Improved ability to find resources: Makes it easier to find, locate, and reuse datasets using standardized, comprehensive metadata.
- Reduction of incidents in the federation: minimizes errors and conflicts by integrating catalogs from different Administrations, guaranteeing consistency and quality in interoperability processes.
What has changed in DCAT-AP-ES?
DCAT-AP-ES expands and orders the previous model to make it more interoperable, more legally accurate and more useful for the maintenance and technical reuse of data catalogues.
The main changes are:
- In the catalog: It is now possible to link catalogs to each other, record who created them, add a supplementary statement of rights to the license, or describe each entry using records.
- In datasets: New properties are added to comply with regulations on high-value sets, support communication, document provenance and relationships between resources, manage versions, and describe spatial/temporal resolution or website. Likewise, the responsibility of the license is redefined, moving its declaration to the most appropriate level.
- For distributions: Expanded options to indicate planned availability, legislation, usage policy, integrity, packaged formats, direct download URL, own license, and lifecycle status.
A practical and gradual approach
Many catalogs already meet the requirements set out in the 2013 version of NTI-RISP. In these cases, the migration to DCAT-AP-ES requires a reduced adjustment, although the guide also contemplates more complex scenarios, following a progressive and adaptable approach.
The document distinguishes between the minimum compliance required and some extensions that improve quality and interoperability.
It is recommended to follow an iterative strategy: starting from the minimum core to ensure operational continuity and, subsequently, planning the phased incorporation of additional elements, such as data services, contact, applicable legislation, categorization of HVDs and contextual metadata. This approach reduces risks, distributes the effort of adaptation, and favors an orderly transition.
Once the first adjustments have been made, the catalogue can be federated with both the National Catalogue, hosted in datos.gob.es, and the Official European Data Catalogue, progressively increasing the quality and interoperability of the metadata.
The guide is a technical support material that facilitates a basic transition, in accordance with the minimum interoperability requirements. In addition, it complements other reference resources, such as the DCAT-AP-ES Application Profile Model and Implementation Technical Guide, the implementation examples (Migration from NIT-RISP to DCAT-AP-ES and Migration from NTI-RISP to DCAT-AP-ES HHD), and the complementary conventions to the DCAT-AP-ES model that define additional rules to address practical needs.
Context and need for an update
Data is a key resource in the digital transformation of public administrations. Ensuring its access, interoperability and reuse is fundamental to improve transparency, foster innovation and enable the development of efficient public services centered on citizens.
In this context, the Technical Standard for Interoperability for the Reuse of information Resources (NTI-RISP) is the regulatory framework in Spain for the management and opening of public data since 2013. The standard sets common conditions on selection, identification, description, format, terms of use and provision of documents and information resources produced or held by the public sector, relating to numerous areas of interest such as social, economic, legal, tourism, business, education information, etc., fully complying with the provisions of Law 37/2007, of November 16.
In recent months, the text has been undergoing modernization in line with the European and Spanish objective of boosting the data economy, promoting its large-scale exchange within distributed and federated environments, guaranteeing adequate cybersecurity conditions and respecting European principles and values.
The new standard, currently in the processing stage, refers to a new metadata model aligned with the latest versions of European standards, which facilitate the description of datasets and reusable information resources made publicly available.
This new metadata model, called DCAT-AP-ES, adopts the guidelines of the European metadata exchange schema DCAT-AP (Data Catalog Vocabulary – Application Profile) with some additional restrictions and adjustments. DCAT-AP-ES is aligned with the European standards DCAT-AP 2.1.1 and the extension DCAT-AP-HVD 2.2.0, which incorporates the requirements for High-Value Datasets (HVD) defined by the European Commission.
What is DCAT-AP and how is it applied in Spain?
DCAT-AP is an application profile based on the DCAT vocabulary from the W3C, designed to improve the interoperability of public sector open data catalogues in Europe. Its goal is to provide a common metadata model that facilitates the exchange, aggregation and federation of catalogues from different countries and organizations (interoperability).
DCAT-AP-ES, as the Spanish application profile of DCAT-AP, is designed to adapt to the particulars of the national context, ensuring efficient management of open data at the national, regional and local levels.
DCAT-AP-ES is established as the standard to be considered in the new version of the NTI-RISP, which in turn is framed within the National Interoperability Framework (ENI), regulated by Royal Decree 4/2010, which sets the conditions for the reuse of public sector information in Spain.
Main news in DCAT-AP-ES
The new version of DCAT-AP-ES introduces significant improvements that facilitate interoperability and data management in the digital ecosystem. Among others:
Alignment with DCAT-AP
- Greater compatibility with European open data catalogues by aligning NTI-RISP with the EU standard DCAT-AP.
- Inclusion of advanced properties to improve the description of datasets and data services, to ensure the possibilities indicated below.
Incorporation of metadata for the description of High-Value Datasets (HVD)
- Facilitates compliance with European regulation on high-value data.
- Enables detailed description of data in key sectors such as geospatial, meteorology, earth observation and environment, statistics, mobility and business.
Improvements in the description of data services
- Inclusion of specific metadata to describe APIs and data access services.
- Possibility to express a dataset in different contexts (e.g. geospatial, with a map server, or statistical, with a data API).
Support for provenance and data quality
- Incorporation of new properties to manage lifecycle, versioning and origin.
- Implementation of validation and quality control mechanisms using SHACL, ensuring consistency and structure of metadata in catalogues.
Use of controlled vocabularies and best practices
- Adaptation of standardized vocabularies for licenses, data formats, languages and themes.
- Greater clarity in data classification to facilitate discovery.
Data governance and improved agent management
- Specification of agent roles (creator, publisher) and contact points.
- Enhanced metadata to represent resource provenance.
Validation of conformity and metadata quality
- Guides to help validate metadata that comply with DCAT-AP-ES.
- Validation of DCAT-AP-ES graphs against SHACL templates.
Key benefits of the update
The adoption of DCAT-AP-ES represents a qualitative leap in the management and reuse of open data in Spain. Among its benefits are:
✅ Facilitates the federation of catalogues and the discovery of data.
✅ Improves interoperability with the European open data ecosystem.
✅ Complies with European open data regulations.
✅ Increases metadata quality through validation mechanisms.
✅ Ensures that data are FAIR (Findable, Accessible, Interoperable, Reusable).
Implementation and next steps
When will it come into force?
The new application profile DCAT-AP-ES will be progressively implemented in Spain's open data catalogues. Its application will be mandatory once the modification text of the standard comes into force which, as mentioned earlier, is currently undergoing administrative processing but is already compatible with the datos.gob.es data federator.
Are there supporting materials and resources for implementing DCAT-AP-ES?
The management team of the datos.gob.es platform has developed the DCAT-AP-ES Technical guide and model, available in the datos.gob.es repository.
This repository will be enriched as new needs of users applying the standard are identified. Likewise, help guides and educational resources will be developed to facilitate its adoption by publishing organizations. All the news and resources produced in the context of the application profile will be announced and referenced punctually on datos.gob.es.
Where to find more information?
The updated documentation, guides and resources will be accessible on datos.gob.es and in the associated code repository. At present the following are available:
- DCAT-AP-ES Technical guide and model
- DCAT-AP-ES Conventions
- DCAT-AP-ES Implementation examples
- DCAT-AP-ES Frequently Asked Questions
- DCAT-AP-ES Metadata validation
- DCAT-AP explanatory video: Spanish / English
- datos.gob.es
Learn more in this video:
And this infographic (click to access the interactive and accessible version):
Data is the engine of innovation, and its transformative potential is reflected in all areas, especially in health. From faster diagnoses to personalized treatments to more effective public policies, the intelligent use of health information has the power to change lives in profound and meaningful ways.
But, for this data to unfold its full value and become a real force for progress, it is essential that it "speaks the same language". That is, they must be well organized, easy to find, and can be shared securely and consistently across systems, countries, and practitioners.
This is where HealthDCAT-AP comes into play, a new European specification that, although it sounds technical, has a lot to do with our well-being as citizens. HealthDCAT-AP is designed to describe health data—from aggregated statistics to anonymized clinical records—in a homogeneous, clear, and reusable way, through metadata. In short, it does not act on the clinical data itself, but rather makes it easier for them to be located and better understood thanks to a standardized description.HealthDCAT-AP is exclusively concerned with metadata, i.e., how datasets are described and organized in catalogs, unlike HL7, FHIR, and DICOM, which structure the exchange of clinical information and images. CDA, which describes the architecture of documents; and SNOMED CT, LOINC, and ICD-10, which standardize the semantics of diagnoses, procedures, and observations to ensure that data have the same meaning in any context.
This article explores how HealthDCAT-AP, in the context of the European Health Data Space (EHDS) and the National Health Data Space (ENDS), brings value primarily to those who reuse data—such as researchers, innovators, or policymakers—and ultimately benefits citizens through the advances they generate.
What is HealthDCAT-AP and how does it relate to DCAT-AP?
Imagine a huge library full of health books, but without any system to organize them. Searching for specific information would be a chaotic task. Something similar happens with health data: if it is not well described, locating and reusing it is practically impossible.
HealthDCAT-AP was born to solve this challenge. It is a European technical specification that allows for a clear and uniform description of health datasets within data catalogues, making it easier to search, access, understand and reuse them. In other words, it makes the description of health data speak the same language across Europe, which is key to improving health care, research and policy.
This technical specification is based on DCAT-AP, the general specification used to describe catalogues of public sector datasets in Europe. While DCAT-AP provides a common structure for all types of data, HealthDCAT-AP is your specialized health extension, adapting and extending that model to cover the particularities of clinical, epidemiological, or biomedical data.
HealthDCAT-AP was developed within the framework of the European EHDS2 (European Health Data Space 2) pilot project and continues to evolve thanks to the support of projects such as HealthData@EU Pilot, which are working on the deployment of the future European health data infrastructure. The specification is under active development and its most recent version, along with documentation and examples, can be publicly consulted in its official GitHub repository.
HealthDCAT-AP is also designed to apply the FAIR principles: that data is Findable, Accessible, Interoperable and Reusable. This means that although health data may be complex or sensitive, its description (metadata) is clear, standardized, and useful. Any professional or institution – whether in Spain or in another European country – can know what data exists, how to access it and under what conditions. This fosters trust, transparency, and responsible use of health data. HealthDCAT-AP is also a cornerstone of EHDS and therefore ENDS. Its adoption will allow hospitals, research centres or administrations to share information consistently and securely across Europe. Thus, collaboration between countries is promoted and the value of data is maximized for the benefit of all citizens.
To facilitate its use and adoption, from Europe, under the initiatives mentioned above, tools such as the HealthDCAT-AP editor and validator have been created, which allow any organization to generate descriptions of datasets through metadata that are compatible without the need for advanced technical knowledge. This removes barriers and encourages more entities to participate in this networked health data ecosystem.
How does HealthDCAT-AP contribute to the public value of health data?
Although HealthDCAT-AP is a technical specification focused on the description of health datasets, its adoption has practical implications that go beyond the technological realm. By offering a common and structured way of documenting what data exists, how it can be used and under what conditions, it helps different actors – from hospitals and administrations to research centres or startups – to better access, combine and reuse the available information, enabling the so-called secondary use of the same, beyond its primary healthcare use.
- Faster diagnoses and personalized treatments: When data is well-organized and accessible to those who need it, advances in medical research accelerate. This makes it possible to develop artificial intelligence tools that detect diseases earlier, identify patterns in large populations and adapt treatments to the profile of each patient. It is the basis of personalized medicine, which improves results and reduces risks.
- Better access to knowledge about what data exists: HealthDCAT-AP makes it easier for researchers, healthcare managers or authorities to locate useful datasets, thanks to its standardized description. This can facilitate, for example, the analysis of health inequalities or resource planning in crisis situations.
- Greater transparency and traceability: The use of metadata allows us to know who is responsible for each set of data, for what purpose it can be used and under what conditions. This strengthens trust in the data reuse ecosystem.
- More efficient healthcare services: Standardizing metadata improves information flows between sites, regions, and systems. This reduces bureaucracy, avoids duplication, optimizes the use of resources, and frees up time and money that can be reinvested in improving direct patient care.
- More innovation and new solutions for the citizen: by facilitating access to larger datasets, HealthDCAT-AP promotes the development of new patient-centric digital tools: self-care apps, remote monitoring systems, service comparators, etc. Many of these solutions are born outside the health system – in universities, startups or associations – but directly benefit citizens.
- A connected Europe around health: By sharing a common way of describing data, HealthDCAT-AP makes it possible for a dataset created in Spain to be understood and used in Germany or Finland, and vice versa. This promotes international collaboration, strengthens European cohesion and ensures that citizens benefit from scientific advances regardless of their country.
And what role does Spain play in all this?
Spain is not only aligned with the future of health data in Europe: it is actively participating in its construction. Thanks to a solid legal foundation, a largely digitized healthcare system, accumulated experience in the secure sharing of health information within the Spanish National Health System (SNS), and a long history of open data—through initiatives such as datos.gob.es—our country is in a privileged position to contribute to and benefit from the European Health Data Space (EHDS).
Over the years, Spain has developed legal frameworks and technical capacities that anticipate many of the requirements of the EHDS Regulation. The widespread digitalization of healthcare and the experience in using data in a secure and responsible way allow us to move towards an interoperable, ethical and common good-oriented model.
In this context, the National Health Data Space project represents a decisive step forward. This initiative aims to become the national reference platform for the analysis and exploitation of health data for secondary use, conceived as a catalyst for research and innovation in health, a benchmark in the application of disruptive solutions, and a gateway to different data sources. All of this is carried out under strict conditions of anonymization, security, transparency, and protection of rights, ensuring that the data is only used for legitimate purposes and in full compliance with current regulations.
Spain's familiarity with standards such as DCAT-AP facilitates the deployment of HealthDCAT-AP. Platforms such as datos.gob.es, which already act as a reference point for the publication of open data, will be key in its deployment and dissemination.
Conclusions
HealthDCAT-AP may sound technical, but it is actually a specification that can have an impact on our daily lives. By helping to better describe health data, it makes it easier for that information to be used in a useful, safe, and responsible manner.
This specification allows the description of data sets to speak the same language across Europe. This makes it easier to find, share with the right people, and reuse for purposes that benefit us all: faster diagnoses, more personalized treatments, better public health decisions, and new digital tools that improve our quality of life.
Spain, thanks to its experience in open data and its digitized healthcare system, is actively participating in this transformation through a joint effort between professionals, institutions, companies, researchers, etc., and also citizens. Because when data is understood and managed well, it can make a difference. It can save time, resources, and even lives.
HealthDCAT-AP is not just a technical specification: it is a step forward towards more connected, transparent, and people-centered healthcare. A specification designed to maximize the secondary use of health information, so that all of us as citizens can benefit from it.
Content created by Dr. Fernando Gualo, Professor at UCLM and Government and Data Quality Consultant. The content and views expressed in this publication are the sole responsibility of the author.
Open data plays a relevant role in technological development for many reasons. For example, it is a fundamental component in informed decision making, in process evaluation or even in driving technological innovation. Provided they are of the highest quality, up-to-date and ethically sound, data can be the key ingredient for the success of a project.
In order to fully exploit the benefits of open data in society, the European Union has several initiatives to promote the data economy, a single digital model that encourages data sharing, emphasizing data sovereignty and data governance, the ideal and necessary framework for open data.
In the data economy, as stated in current regulations, the privacy of individuals and the interoperability of data are guaranteed. The regulatory framework is responsible for ensuring compliance with this premise. An example of this can be the modification of Law 37/2007 for the reuse of public sector information in compliance with European Directive 2019/1024. This regulation is aligned with the European Union's Data Strategy, which defines a horizon with a single data market in which a mutual, free and secure exchange between the public and private sectors is facilitated.
To achieve this goal, key issues must be addressed, such as preserving certain legal safeguards or agreeing on common metadata description characteristics that datasets must meet to facilitate cross-industry data access and use, i.e. using a common language to enable interoperability between dataset catalogs.
What are metadata standards?
A first step towards data interoperability and reuse is to develop mechanisms that enable a homogeneous description of the data and that, in addition, this description is easily interpretable and processable by both humans and machines. In this sense, different vocabularies have been created that, over time, have been agreed upon until they have become standards.
Standardized vocabularies offer semantics that serve as a basis for the publication of data sets and act as a "legend" to facilitate understanding of the data content. In the end, it can be said that these vocabularies provide a collection of metadata to describe the data being published; and since all users of that data have access to the metadata and understand its meaning, it is easier to interoperate and reuse the data.
W3C: DCAT and DCAT-AP Standards
At the international level, several organizations that create and maintain standards can be highlighted:
- World Wide Web Consortium (W3C): developed the Data Catalog Vocabulary (DCAT): a description standard designed with the aim of facilitating interoperability between catalogs of datasets published on the web.
- Subsequently, taking DCAT as a basis, DCAT-AP was developed, a specification for the exchange of data descriptions published in data portals in Europe that has more specific DCAT-AP extensions such as:
- GeoDCAT-AP which extends DCAT-AP for the publication of spatial data.
- StatDCAT-AP which also extends DCAT-AP to describe statistical content datasets.
- Subsequently, taking DCAT as a basis, DCAT-AP was developed, a specification for the exchange of data descriptions published in data portals in Europe that has more specific DCAT-AP extensions such as:
ISO: Organización de Estandarización Internacional
Además de World Wide Web Consortium, existen otras organizaciones que se dedican a la estandarización, por ejemplo, la Organización de Estandarización Internacional (ISO, por sus siglas en inglés Internacional Standarization Organisation).
- Entre otros muchos tipos de estándares, ISO también ha definido normas de estandarización de metadatos de catálogos de datos:
- ISO 19115 para describir información geográfica. Como ocurre en DCAT, también se han desarrollado extensiones y especificaciones técnicas a partir de ISO 19115, por ejemplo:
- ISO 19115-2 para datos ráster e imágenes.
- ISO 19139 proporciona una implementación en XML del vocabulario.
- ISO 19115 para describir información geográfica. Como ocurre en DCAT, también se han desarrollado extensiones y especificaciones técnicas a partir de ISO 19115, por ejemplo:
The horizon in metadata standards: challenges and opportunities
In the Action Plan of the International Open Data Conference, capacity development has become a priority within the international open data movement. After all, the need for training tools is essential for leaders responsible for PSI policies, data producers and reusers, public and private sector, and even citizens. For this reason, providing training tools that allow the different agents to advance in the openness and re-use of public data is a priority task.
To this end, eight training units have been developed within the dissemination, awareness raising and training line of Iniciativa Aporta, aimed at all types of public: from citizens who read for the first time about open data to public employees, responsible for open information initiatives who want to expand their knowledge in the field.
The training units are designed to understand the basic concepts of the open data movement, to know best practices in the implementation of open data policies and their re-use, methodological guidelines for open data, technical regulations such as DCAT-AP and NTI-RISP, in addition to the use of data processing tools, among other aspects.
In the development of resources, two types of learning have been taken into account. Learning by discovery, oriented to extend the knowledge to solve the doubts and reflections raised, and significant learning based on prior knowledge, through the use of practical examples to contextualize and apply the concepts treated.
In addition, the training modules contain complementary materials through links to external pages and documents to be downloaded without connection. In this way, the student is given the opportunity to enhance his knowledge and familiarize himself with relevant sources to obtain reliable and up-to-date information about the open data sector.
All units are distributed under the Creative Commons Share-Alike Attribution Licence (CC-BY-SA) which allows copying, distributing the material in any medium or format and adapting it to create new resources from it.
The training material developed by Iniciativa Aporta consists of eight didactic units that address the following contents of the open data sector:
- Basic concepts, benefits and barriers of open data
- Legal framework
- Trends and best practices on the implementation of open data practices
- The re-use of public data on its transformative role
- Methodological guidelines for open data
- DCAT-AP and NTI-RISP
- Use of basic tools for data treatment
- Best practices in the design of APIs and Linked Data
Each unit is designed in a way the student expands his knowledge on the open data sector. In order to facilitate their understanding, all of them have a similar structure that includes objectives, contents, evaluation activities, practical examples, complementary information and conclusions.
All the training units can be done online, directly from the datos.gob.es or, in its absence, it is also possible to download them on the user's device and even load it on an LMS platform.
Each unit is independent; enabling the student to acquire the necessary knowledge in a specific subject according to their training needs. However, those students who wish to have a more complete view of the PSI sector have the opportunity to perform the complete series of eight training units in order to know in depth the most relevant aspects of open data initiatives.
The training units are available in the "Documentation" section under the category "Training materials" to be carried out through the online portal or to be downloaded.
Training materials of the Aporta Initiative
The DCAT-AP application profile aims to describe, using metadata, the catalogs and datasets of European open data portals. For this, DCAT-AP is based on Data Catalogs Vocabulary (DCAT), published by W3C. In particular, the DCAT-AP is a specification that describes a series of restrictions (such as properties range) on the DCAT model.
In a context of continue economic, technological and social changes, this application profile is constantly evolving and improving to meet users demands. The organism in charge of managing the maintenance and evolution of DCAT-AP is JoinUp, a collaborative platform created by the European Commission and financed by the European Union through the ISA and ISA2 Programs. Through this tool, different versions of DCAT-AP and guidelines for their standard implementations have been published.
To help those organizations that have doubts about how to apply this profile, here there are the main DCAT-AP documents and resources available in JoinUp:
Documents | Description |
---|---|
DCAT-AP versions | The different versions on the DCAT-AP profiles are shown through a timeline. In this way, you can easily access to latest one. |
Implementation guidelines | It includes a list of technical and organizational guidelines to facilitate the implementation of DCAT-AP, which includes examples of implementations that can help solve different challenges. In addition, users could share the tools they have developed (such as validators). |
National extensions analysis of DCAT-AP | Based on the DCAT-AP specification, each UE country has produced a series of adaptations to meet its own needs. This analysis covers these extensions, looking for repetitive patterns that could be used as an input for future versions of DCAT-AP. |
GeoDCAT-AP | An extension of DCAT-AP for the exchange of descriptions of geospatial datasets and services. |
StatDCAT-AP | An extension of DCAT-AP for the exchange of descriptions of statistical datasets and services. |
Change and Release Management Policy for DCAT-AP | Documentation related to the changes that can be distinguished for DCAT-AP. It analyses 3 types of changes according to their implications for interoperability: bug fixed, minor semantic changes and major semantic changes. |
Tools library | Includes tools developed by SEMIC or the users to promote semantic interoperability. |
Document library | On this page, they keep track of studies carried out under different actions of the ISA² Programme, grouped according different topics. |
In addition to these resources, there are force task that have developed reports applied to specific fields, such as research field. For their part, W3C itself has also published reports and tools to help users, such as this guide called dataset Exchange Use Cases and requirements, in this case focused on DCAT.
To be informed about all changes that take place and the documents that are published, users can subscribe to the GitHub project created to share experiences, challenges and suggestions of new features.
Datos.gob.es is also part of Joinup''s collaborators network, so we actively participate in the dissemination of the contents and resources created to facilitate the implementation of DCAT-AP. If you want to know more about DCAT-AP application profile, we recommend the report: DCAT-AP and its extensions: Context and evolution.
One of the main challenges that arise when addressing an Open Data initiative is to define the information architecture and facilitate interoperability between data catalogs published by different portals on the Web. In order to solve this challenge, the World Wide Web Consortium (W3C) published the Data Catalog Vocabulary (DCAT), an RDF vocabulary to describe data catalogs based on 3 key concepts: catalog, dataset and distribution.
Based on this vocabulary, and within the JoinUP project, a collaborative platform created by the European Commission, an international group of experts developed the DCAT Application profile for data portals in Europe (DCAT-AP): a specification that describes restrictions (such as properties range) on the DCAT model. The objective is to facilitate homogenization and cross-searching, using metadata, between different European data portals generated by public sector and placed at citizens disposal for reuse.
The report DCAT-AP and its extensions: context and evolution, developed within the Aporta Initiative framework, arose to contextualize and delve into DCAT-AP, and DCAT vocabulary. The report includes a description of both publications, as well as a definition of the agencies and institutions involved in its definition.
DCAT-AP extensions and modifications
Based on DCAT-AP, sector extensions have been developed, some of the most relevant in specific areas of application are described in this report: DCAT-AP HVD, DCAT-AP extension for the description of high-value data, GeoDCAT-AP, focused on the exchange of descriptions of geospatial datasets and services, StatDCAT-AP, an extension of DCAT-AP for the exchange of descriptions of statistical datasets and services, MLDCAT-AP extending DCAT-AP in the field of machine learning, and BRegDCAT-AP for the description of fundamental aspects of public administrative records.
Since its appearance and throughout these years, practically all the Member States of the European Union have extended the DCAT-AP application profile to meet their needs. Special emphasis is placed on Spain, where there is the peculiarity that the "extension" - the Norma Técnica de Interoperabilidad de Reutilización de recursos de información (NTI-RISP) - preceded the DCAT-AP specification itself, which at the time of writing has evolved into the DCAT-AP-ES reference extension.. (NTI-RISP) establishes the common framework for opening and using documents and information resources produced or held by public administrations. This technical standard aims to ensure the persistence of information and the use of formats, and to promote appropriate terms and conditions of use. The NTI-RISP standard predates the first versions of DCAT and DCAT-AP, which has resulted in some differences.
Finally, some DCAT-AP extensions implemented by the different Member States are listed for reference.
The European Commission has taken an important step by completing the DCAT Application Profile (DCAT-AP), a joint initiative of the ISA Programme, the EU Publications Office and DG CONNECT. This specification is an extension of the W3C Data Catalogue Vocabulary (DCAT) and the definition of a regulatory policy for its application in describing public sector datasets in Europe.
The goal of DCAT is quite simple and, for that reason, has been warmly accepted by the open data community since W3C published the DCAT specification as a W3C Recommendation in January 2014. The aim is to provide an RDF vocabulary (a set of classes and properties) designed to describe in a structured manner the content of datasets and data catalogues on the Web. In short, let us imagine an organization that wishes to publish a set of CSV files related to economic indicators for a given topic. Thanks to DCAT, this entity can provide processable descriptions (in RDF), identifying these files as a catalogue (dcat:Catalogue), in which each file in particular is a dataset (dcat:Dataset) whose topic is identified by elements (skos:Concept) of a controlled vocabulary, a thesaurus, a taxonomy, etc.
DCAT-AP’s main contributions can be summarised as follows:
- It does not introduce a new vocabulary. On the contrary, its aim is to define in a precise way the use of certain DCAT classes and properties for the publication of data in the European Union. In the extensions needed to describe catalogues and datasets and not present in DCAT, other existing vocabularies are re-used (as in the case of foaf:Document to identify the web portals where catalogues are published).
- It defines a complete policy for the use of DCAT-AP, specifying which classes and properties are compulsory, recommended or optional in the application of the vocabulary within the European Union.
- It establishes regulatory principles of conformance for publishing and using DCAT-AP documents (section 6 of the specification).
- It explains the use of controlled vocabularies (in SKOS) for the description of the topic of datasets, expanding the previous example; of particular importance is the explicit recommendation on reuse of European vocabularies such as Eurovoc. This opens the door to potential applications of DCAT-AP in the field of public procurement.
Additionally, certain new developments in DCAT-AP are taking place:
- An extension of DCAT-AP for the exchange of descriptions of geospatial datasets and services: GeoDCAT-AP. The working group has already published a first version (v1.0), still in the working draft phase. Its main aim is to provide an RDF syntax to combine the metadata framework of the INSPIRE initiative and ISO 19115:2003, in accordance with the conformity principles laid down by DCAT-AP.
- An extension of DCAT-AP for publishing statistical datasets: StatDCAT-AP. A working group has been recently created for this purpose. The work in this first phase will concentrate on finding significant common metadata in the different portals that publish statistical data, such as Eurostat. By seeking similarities, StatDCAT-AP aims to be for RDF Data Cube vocabulary the same as the SDMX/EMS metadata framework for the SDMX specification.
Finally, the European Commission has opened a line of work to define the guidelines for DCAT-AP implementation. In this regard, European organizations are invited to participate and share real cases of DCAT-AP applications as well as the problems and challenges faced in their implementation. This new specification and its early adoption by the European Data Portal Project is a good sign for the Open Data sector. The specification (v1.1) is published within the JoinUp project.