How IDS-RAM could help in the creation of open data ecosystems

Fecha de la noticia: 28-10-2022

Image showing a data workflow

Many organisations and administrations have found in open data a transformational pillar on which to exercise the strategy towards the data culture. Having access to data in a structured way is the basis for new business models, as well as for new initiatives aimed at citizens in different fields of action.

However, realising the full potential of open data requires a platform capable of making this data available to third parties while ensuring its quality, understanding, privacy and security.

In this context, the book Designing Data Spaces”, includes a chapter by Fabian Kirstein and Vincent Bohlen, which proposes the use of the IDS-RAM architecture proposed by the International Data Spaces (IDS) for the development of open data ecosystems. It provides a proof of concept on the feasibility of the IDS architecture for public data spaces with the aim of achieving a solid foundation for building and maintaining interoperable open data ecosystems capable of addressing existing challenges.

The following is a summary of the views gathered in the chapter.

Open data ecosystems

Data spaces are ecosystems where different actors share data voluntarily and securely, following common governance, organisational, regulatory and technical mechanisms.

IDSA (International Data Spaces Association) was created in 2016, with the aim of boosting the global digital economy, through a secure and sovereign system of data exchange in which all participants can obtain the maximum value from their data. It is a coalition of more than 130 international companies with representation in more than 20 countries around the world.

Among other initiatives, it promotes an architectural reference model called IDS-RAM, which aims to facilitate the exchange of data to optimise its value, but without losing control. It offers several approaches whose applicability can be understood in the context of both private and open data, as it is based on metadata repositories for sharing information. That is, the data remain under the control of its owners and the standardised metadata are centrally managed for sharing.

The creation of data spaces brings with it a number of risks to be addressed, both from the consumer's and the provider's point of view. Data providers focus on legal compliance, through issues such as data ownership. Although common standards exist for aspects such as metadata description - the World Wide Web Consortium is no stranger to the problem and therefore proposed several years ago its Data Catalog Vocabulary (DCAT), a standard for describing data catalogues - the truth is that interoperability is sometimes far from its greatest potential. This is because there is sometimes incomplete metadata, the quality is poor, data is outdated, there are difficulties in accessing data and interoperating, and so on.

The applicability of IDS-RAM in open data environments

IDS offers an approach based on guaranteeing data sovereignty to providers, facilitating data exchange and addressing the concerns of both consumers and providers.

The concepts and technologies underlying open data and IDS-RAM are very similar. Both initiatives rely on metadata repositories to share information about the availability and accessibility of data. These repositories store metadata, without the need to transfer the actual data. Therefore, both concepts follow the principles of decentralisation and transfer of metadata to and from central information access points. The actual data remains under the control of the data publisher's infrastructure until a user requests it. In addition, the IDS information model is based on the principles of Linked Data and DCAT. This makes it a system that is easily compatible with open data portals, driving interoperability between data spaces and open data portals.

The architecture proposed by IDS is mainly based on two artefacts, a connector to data sources (Open Data Connector) and a metadata store (Open Data Broker), as shown in the following image extracted from the book "Designing Data Spaces":

Figure illustrating the IDS-RAM architecture, explained below.

  • Open Data Connector: adopts the role of open data provider. Each publishing entity applies an instance of the connector to announce availability and grant access to the data. As it is open data, and therefore public, it is not necessary to apply usage policies or restrictions as strict as in the case of other private data connectors based on this architecture, which allows for easier configuration and management a priori.
  • Open Data Broker: the centralised metadata repository fulfils a similar function to that of an open data portal. From this metadata, the portal interface offers functionalities to locate and download the data from the connectors.

This management allows grouping by different application domains, i.e. centralised metadata repositories can be created for sectors such as health or tourism, as well as at municipal, regional, national or international level.

In a data ecosystem such as the one proposed by IDS, the connector informs about available or updated data, and in the metadata repository these are updated accordingly. For this purpose, communication mechanisms based on the IDS information model and the IDS Communication Protocol (IDS Communication Protocol or IDSCP) are used to announce possible changes in data availability. In this way, the availability of up-to-date data is guaranteed.

In open data portals that collect a large number of data sources, accessibility and overall usability depend on the metadata provided by the original data providers. Standards such as DCAT provide a common basis, but IDS offers more stringent specifications in the communication process.

Although it is an interesting proposal, in the context of open data, this approach has not yet been implemented in any space. However, proofs of concept have already been made, as can be seen in the Public Data Space, a showcase available since December 2020 that reproduces how the solution works. Here, connectors display the open data offerings of different data portals in Germany and are registered in a metadata repository.

The following image shows the workflow of an IDS-RAM-based model versus a more traditional approach:

Picture showing graphically the difference between the traditional data flow and the IDS proposal, explained below.

Conclusions

Open data portals provide access to open data from a variety of providers. The overall usability of these portals depends to some extent on the discoverability of the data, which in turn depends on the quality of their metadata.

To counteract the problems of unavailable data or dead links that sometimes occur in open data environments, portals periodically collect the publisher's data catalogues and perform availability checks. In the IDS-RAM-based open data ecosystem, the connector informs the broker about available or updated datasets. The 'pull' accountability approach that is common in the typical open data environments is reversed to a 'push' approach in the IDS ecosystem. This approach focuses on the responsibility of the publisher to maintain the data supply and also presents new possibilities to control its dissemination. Using IDS-RAM, the publisher chooses which metadata broker it signs up to, giving it greater sovereignty over their data.

For data consumers, this approach can lead to improvements in terms of the ability to find data in a timely manner and reduces fragmentation. Moreover, if open data can be acquired, managed and processed with the same tools and applications that are already applied in industry, the possibilities for integration and reuse multiply.


Content prepared by Juan Mañes, expert in Data Governance, with contributions from the Data Office.

The contents and views expressed in this publication are the sole responsibility of the author.