The digitalization in the public sector in Spain has also reached the judicial field. The first regulation to establish a legal framework in this regard was the reform that took place through Law 18/2011, of July 5th (LUTICAJ). Since then, there have been advances in the technological modernization of the Administration of Justice. Last year, the Council of Ministers approved a new legislative package to definitively address the digital transformation of the public justice service, the Digital Efficiency Bill.
This project incorporates various measures specifically aimed at promoting data-driven management, in line with the overall approach formulated through the so-called Data Manifesto promoted by the Data Office.
Once the decision to embrace data-driven management has been made, it must be approached taking into account the requirements and implications of Open Government, so that not only the possibilities for improvement in the internal management of judicial activity are strengthened, but also the possibilities for reuse of the information generated as a result of the development of said public service (RISP).
Open data: a premise for the digital transformation of justice
To address the challenge of the digital transformation of justice, data openness is a fundamental requirement. In this regard, open data requires conditions that allow their automated integration in the judicial field. First, an improvement in the accessibility conditions of the data sets must be carried out, which should be in interoperable and reusable formats. In fact, there is a need to promote an institutional model based on interoperability and the establishment of homogeneous conditions that, through standardization adapted to the singularities of the judicial field, facilitate their automated integration.
In order to deepen the synergy between open data and justice, the report prepared by expert Julián Valero identifies the keys to digital transformation in the judicial field, as well as a series of valuable open data sources in the sector.
If you want to learn more about the content of this report, you can watch the interview with its author.
Below, you can download the full report, the executive summary, and a summary presentation.
The combination and integration of open data with artificial intelligence (AI) is an area of work that has the potential to achieve significant advances in multiple fields and bring improvements to various aspects of our lives. The most frequently mentioned area of synergy is the use of open data as input for training the algorithms used by AI since these systems require large amounts of data to fuel their operations. This makes open data an essential element for AI development and utilizing it as input brings additional advantages such as increased equality of access to technology and improved transparency regarding algorithmic functioning.
Today, we can find open data powering algorithms for AI applications in diverse areas such as crime prevention, public transportation development, gender equality, environmental protection, healthcare improvement, and the creation of more friendly and liveable cities. All of these objectives are more easily attainable through the appropriate combination of these technological trends.
However, as we will see next, when envisioning the joint future of open data and AI, the combined use of both concepts can also lead to many other improvements in how we currently work with open data throughout its entire lifecycle. Let's review step by step how artificial intelligence can enrich a project with open data.
Utilizing AI to Discover Sources and Prepare Data Sets
Artificial intelligence can assist right from the initial steps of our data projects by supporting the discovery and integration of various data sources, making it easier for organizations to find and use relevant open data for their applications. Furthermore, future trends may involve the development of common data standards, metadata frameworks, and APIs to facilitate the integration of open data with AI technologies, further expanding the possibilities of automating the combination of data from diverse sources.
In addition to automating the guided search for data sources, AI-driven automated processes can be helpful, at least in part, in the data cleaning and preparation process. This can improve the quality of open data by identifying and correcting errors, filling gaps in the data, and enhancing its completeness. This would free scientists and data analysts from certain basic and repetitive tasks, allowing them to focus on more strategic activities such as developing new ideas and making predictions.
Innovative Techniques for Data Analysis with AI
One characteristic of AI models is their ability to detect patterns and knowledge in large amounts of data. AI techniques such as machine learning, natural language processing, and computer vision can easily be used to extract new perspectives, patterns, and knowledge from open data. Moreover, as technological development continues to advance, we can expect the emergence of even more sophisticated AI techniques specifically tailored for open data analysis, enabling organizations to extract even more value from it.
Simultaneously, AI technologies can help us go a step further in data analysis by facilitating and assisting in collaborative data analysis. Through this process, multiple stakeholders can work together on complex problems and find answers through open data. This would also lead to increased collaboration among researchers, policymakers, and civil society communities in harnessing the full potential of open data to address social challenges. Additionally, this type of collaborative analysis would contribute to improving transparency and inclusivity in decision-making processes.
The Synergy of AI and Open Data
In summary, AI can also be used to automate many tasks involved in data presentation, such as creating interactive visualizations simply by providing instructions in natural language or a description of the desired visualization.
On the other hand, open data enables the development of applications that, combined with artificial intelligence, can provide innovative solutions. The development of new applications driven by open data and artificial intelligence can contribute to various sectors such as healthcare, finance, transportation, or education, among others. For example, chatbots are being used to provide customer service, algorithms for investment decisions, or autonomous vehicles, all powered by AI. By using open data as the primary data source for these services, we would achieve higher
Finally, AI can also be used to analyze large volumes of open data and identify new patterns and trends that would be difficult to detect through human intuition alone. This information can then be used to make better decisions, such as what policies to pursue in each area to bring about the desired changes.
These are just some of the possible future trends at the intersection of open data and artificial intelligence, a future full of opportunities but at the same time not without risks. As AI continues to develop, we can expect to see even more innovative and transformative applications of this technology. This will also require closer collaboration between artificial intelligence researchers and the open data community in opening up new datasets and developing new tools to exploit them. This collaboration is essential in order to shape the future of open data and AI together and ensure that the benefits of AI are available to all in a fair and equitable way.
Content prepared by Carlos Iglesias, Open data Researcher and consultant, World Wide Web Foundation.
The contents and views reflected in this publication are the sole responsibility of the author.
Open data is a highly valuable source of knowledge for our society. Thanks to it, applications can be created that contribute to social development and solutions that help shape Europe's digital future and achieve the Sustainable Development Goals (SDGs).
The European Open Data portal (data.europe.eu) organizes online events to showcase projects that have been carried out using open data sources and have helped address some of the challenges our society faces: from combating climate change and boosting the economy to strengthening European democracy and digital transformation.
In the current year, 2023, four seminars have been held to analyze the positive impact of open data on each of the mentioned themes. All the material presented at these events is published on the European data portal, and recordings are available on their YouTube channel, accessible to any interested user.
In this post, we take a first look at the showcased use cases related to boosting the economy and democracy, as well as the open data sets used for their development.
Solutions Driving the European Economy and Lifestyle
In a rapidly evolving world where economic challenges and aspirations for a prosperous lifestyle converge, the European Union has demonstrated an unparalleled ability to forge innovative solutions that not only drive its own economy but also elevate the standard of living for its citizens. In this context, open data has played a pivotal role in the development of applications that address current challenges and lay the groundwork for a prosperous and promising future. Two of these projects were presented in the second webinar of the series "Stories of Use Cases”, an event focused on "Open Data to Foster the European Economy and Lifestyle": UNA Women and YouthPOP.
The first project focuses on tackling one of the most relevant challenges we must overcome to achieve a just society: gender inequality. Closing the gender gap is a complex social and economic issue. According to estimates from the World Economic Forum, it will take 132 years to achieve full gender parity in Europe. The UNA Women application aims to reduce that figure by providing guidance to young women so they can make better decisions regarding their education and early career steps. In this use case, the company ITER IDEA has used over 6 million lines of processed data from various sources, such as data.europa.eu, Eurostat, Censis, Istat (Italy's National Institute of Statistics), and NUMBEO.
The second presented use case also targets the young population. This is the YouthPOP application (Youth Public Open Procurement), a tool that encourages young people to participate in public procurement processes. For the development of this app, data from data.europa.eu, Eurostat, and ESCO, among others, have been used. YouthPOP aims to improve youth employment and contribute to the proper functioning of democracy in Europe.
Open Data for Boosting and Strengthening European Democracy
In this regard, the use of open data also contributes to strengthening and consolidating European democracy. Open data plays a crucial role in our democracies through the following avenues:
- Providing citizens with reliable information.
- Promoting transparency in governments and public institutions.
- Combating misinformation and fake news.
The theme of the third webinar organized by data.europe.eu on use cases is "Open Data and a New Impetus for European Democracy". This event presented two innovative solutions: EU Integrity Watch and the EU Institute for Freedom of Information.
Firstly, EU Integrity Watch is a platform that provides online tools for citizens, journalists, and civil society to monitor the integrity of decisions made by politicians in the European Union. This website offers visualizations to understand the information and provides access to collected and analyzed data. The analyzed data is used in scientific disclosures, journalistic investigations, and other areas, contributing to a more open and transparent government. This tool processes and offers data from the Transparency Register.
The second initiative presented in the democracy-focused webinar with open data is the EU Institute for Freedom of Information (IDFI), a Georgian non-governmental organization that focuses on monitoring and supervising government actions, revealing infractions, and keeping citizens informed.
The main activities of the IDFI include requesting public information from relevant bodies, creating rankings of public bodies, monitoring the websites of these bodies, and advocating for improved access to public information, legislative standards, and related practices. This project obtains, analyzes, and presents open data sets from national public institutions.
In conclusion, open data makes it possible to develop applications that reduce the gender wage gap, boost youth employment, or monitor government actions. These are just a few examples of the value that open data can offer to society.

Learn more about these applications in their seminars -> Recordings here.
Open data is the highest level of data sharing, as it is freely available and accessible to all. Properly processed and with full respect for the protection of personal data, it can help citizens, businesses and the public sector to make better decisions.
Open data, together with other data, play a key role in the creation of data spaces, as referred to in the European Data Strategy. As stated in the document, the implementation of common and interoperable data spaces in strategic sectors is set up with the aim of "overcoming technical and legal barriers to data sharing between organisations, combining the necessary tools and infrastructures and addressing trust issues", for example through common standards developed for the space.
In view of its relevance, the European Data Portal Academy has organised a series of webinars on data spaces. The first of these was held on 12 May in an online format and can be viewed here. In it, the new developments and progress being made regarding data spaces were mentioned, developments that in Spain are being carried out by the Data Office.
We summarise below the main aspects addressed in this first seminar, in which Daniele Rizzi, Principal administrator and policy officer and Johan Bodenkamp, Policy and project officer at the Directorate General for Communication Networks, Content and Technologies of the European Commission, participated, with the moderation of Giulia Carsaniga, Research and Policy Lead Consultant at Capgemini.
Data spaces and the EU's digital strategy
The first part of the seminar, which was held online, highlighted how digital transformation is one of the European Union's top priorities. In fact, Europe has a specific strategy to advance in this aspect, i.e. to achieve 'A Europe fit for the digital age', and it is one of the six 2019-24 priorities of the European Commission.
The European Union's digital strategy aims to make digital transformation benefit people and businesses, a context in which the European Data Strategy of February 2020 is framed, which includes a series of measures for the promotion of a European data market, similar to the European Common Market, the seed of the current EU.
The creation of this European data market requires the establishment of a series of actions and standards with a focus on data, technology and infrastructure. A collective effort, including public programmes such as DIGITAL Europe and private programmes such as Gaia-X, is also contributing to this.
One year after the approval of the European Data Strategy, the European Council acknowledged in March 2021 "the need to accelerate the creation of common data spaces and ensure access and interoperability of data" and invited the Commission to "present the progress made and the remaining measures necessary to establish the sectoral data spaces announced in the European Data Strategy of February 2020." Subsequently, in February 2022, the European Commission published a working document on the European data market.
After contextualizing the development of the concept of data spaces within the European framework, the webinar presenters went on to explain the key components that will be part of the data spaces, some of which are already operational and others are still in development. The seminar provided an overview of what the European data space is expected to be like, highlighting the following aspects:
Firstly, there was a discussion about high-value datasets from the public sector. In January of this year, the European Commission published a list of high-value datasets, which are understood as those that provide added value and significant benefits to society. There is a wide variety of high-value data in different areas (health, agriculture, mobility, energy, etc.) that stakeholders make available with varying degrees of openness. As explained in the webinar, the idea is to start creating common high-value data spaces in more homogeneous areas, although the ultimate goal is for data to be shared across all sectors within the European market, as most applications will require data from different domains.
To support the creation of these data spaces, the first initiative launched in Europe is the establishment of the Data Spaces Support Centre. This center explores the needs of data space initiatives, defines common requirements, establishes best practices to accelerate the formation of sovereign data spaces as a crucial element of digital transformation in all areas, and ensures interoperability through compliance with common standards.
In order for all of this to be developed, a technical infrastructure for data spaces is necessary, which facilitates cloud and edge-cloud services, intelligent middleware solutions (Simpl), a digital marketplace, high-performance computing, on-demand artificial intelligence platform, and AI testing and experimentation facilities.
Differences and similarities between data spaces and datalakes
After providing an overview of data spaces in Europe, the seminar addressed their main characteristics. In this regard, a data space was presented as a secure and privacy-respecting IT infrastructure for aggregating, accessing, processing, using, and sharing data. It was also defined as a data governance mechanism that comprises a set of administrative and contractual rules that determine the rights of access, processing, use, and sharing of data in a reliable, transparent, and compliant manner with applicable legislation.
One of the features highlighted in the webinar regarding this type of infrastructure is that data owners have control over who can access which data, for what purpose, and under what conditions they can be used. Additionally, there is a large amount of voluntarily available data that can be reused either for free or in exchange for compensation, depending on the decisions of the data owners.
Furthermore, it was emphasized that data spaces involve the participation of an open number of organizations/individuals, respecting competition rules and ensuring non-discriminatory access for all participants.
Another concept discussed in the seminar was that of datalakes, in comparison to data spaces. Datalakes were defined as repositories that allow storing structured and unstructured data at any scale. In a datalake, as explained in the seminar, data can be stored as is, without the need for prior structuring, and different types of analyses can be performed, ranging from dashboards and visualizations to real-time data processing and machine learning for more informed decision-making. Accessing the datalake implies the possibility of accessing all the contained data, not necessarily in an organized manner.
On the other hand, a data space, according to the presenters, can be defined as a federated data ecosystem based on shared policies and rules. Users of data spaces have the ability to securely, transparently, reliably, easily, and uniformly access data. In a data space, data owners have control over the access and use of their data. From a technical perspective, a data space can be seen as a data integration concept that does not require common database schemas or physical data integration but is based on distributed and integrated data stores as needed.
Using a fishing analogy, in a datalake, the user has to catch the fish themselves, while a data space would be like going to a fish market.
Next steps: Governance framework and European actors
Once the difference between dataspaces and datalakes was presented, the webinar addressed the paradigm shift in data sharing that is currently taking place. Until now, bilateral data exchange based on contractual agreements has been common. However, a new model of data exchange infrastructure with centralized data hosting and/or data markets is gaining momentum, which reduces transaction costs when data is not maintained in a central repository.
According to the presenters, the next step in the evolution of data spaces would be the creation of links between participants in a model where data is federated and stored in a distributed manner, with tools that enable search, access, and analysis across multiple industries, companies, and entities.
To make this process happen, as explained by the presenters, the support and coordinated work of different actors are necessary. On one hand, it would be essential to establish common rules that facilitate data exchange and bring the different stakeholders closer to a common data policy in the EU. Similarly, providing technical solutions and financial support is indispensable.
In this regard, the webinar highlighted an important milestone: the establishment of the European Data Innovation Board (EDIB), which will support the Commission in publishing guidelines to facilitate the development of common European data spaces and identifying the necessary standards and interoperability requirements for data exchange.
As mentioned earlier, the implementation of data spaces requires technical architecture, and the webinar highlighted two free technical solutions:
-
Building Blocks: Open and reusable digital solutions based on standards that enable basic functionalities, such as reliable authentication and secure data exchange.
-
Simpl: The intelligent middleware that will enable cloud-based federations and edge-cloud. It will support major data initiatives funded by the European Commission, such as the common European data spaces.
The key role of the Data Spaces Support Centre
Towards the end of the seminar, the Data Spaces Support Centre (DSCC) initiative was presented in more detail. This center, established in October 2022, provides support to various initiatives in the creation of data spaces and is expected to conclude its activities in March 2026. It consists of twelve partners and also has sixteen collaborating partners, including important associations and companies with expertise in the field of data exchange.
The main mission of the DSCC is to create a network of partners and a community to provide tools for the creation of data spaces. It focuses particularly on interoperability and aims to generate synergies at the European level for the development of data spaces.
The webinar reviewed the collaborations and initiatives in which the Data Spaces Support Centre participates, and it was highlighted that the starter kit, a starting point for building data spaces, is available on its website.
In the final stretch of the seminar, an overview of the relevant actors in the European common data space was provided:
-
Data Spaces Support Centre (DSSC): Responsible for coordinating relevant actions in data spaces.
-
Data Space Coordination and Support Actions (CSAs): Focused on sectoral data spaces.
-
European Data Innovation Board: Starting from September 2023, it will be responsible for setting guidelines to achieve interoperability in data spaces.
If you want to know more about the concept of data spaces and their relevance today, you can watch the full seminar in the following video:
The following training material is now available on data.europa academy:
- The recording of the session;
- The slide deck presented during the webinar.
The emergence of artificial intelligence (AI), and ChatGPT in particular, has become one of the main topics of debate in recent months. This tool has even eclipsed other emerging technologies that had gained prominence in a wide range of fields (legal, economic, social and cultural). This is the case, for example, of web 3.0, the metaverse, decentralised digital identity or NFTs and, in particular, cryptocurrencies.
There is an unquestionable direct relationship between this type of technology and the need for sufficient and appropriate data, and it is precisely this last qualitative dimension that justifies why open data is called upon to play a particularly important role. Although, at least for the time being, it is not possible to know how much open data provided by public sector entities is used by ChatGPT to train its model, there is no doubt that open data is a key to improving their performance.
Regulation on the use of data by AI
From a legal point of view, AI is arousing particular interest in terms of the guarantees that must be respected when it comes to its practical application. Thus, various initiatives are being promoted that seek to specifically regulate the conditions for its use, among which the proposal being processed by the European Union stands out, where data are the object of special attention.
At the state level, Law 15/2022, of 12 July, on equal treatment and non-discrimination, was approved a few months ago. This regulation requires public administrations to promote the implementation of mechanisms that include guarantees regarding the minimisation of bias, transparency and accountability, specifically with regard to the data used to train the algorithms used for decision-making.
There is a growing interest on the part of the autonomous communities in regulating the use of data by AI systems, in some cases reinforcing guarantees regarding transparency. Also, at the municipal level, protocols are being promoted for the implementation of AI in municipal services in which the guarantees applicable to the data, particularly from the perspective of their quality, are conceived as a priority requirement.
The possible collision with other rights and legal interests: the protection of personal data
Beyond regulatory initiatives, the use of data in this context has been the subject of particular attention as regards the legal conditions under which it is admissible. Thus, it may be the case that the data to be used are protected by third party rights that prevent - or at least hinder - their processing, such as intellectual property or, in particular, the protection of personal data. This concern is one of the main motivations for the European Union to promote the Data Governance Regulation, a regulation that proposes technical and organisational solutions that attempt to make the re-use of information compatible with respect for these legal rights.
Precisely, the possible collision with the right to the protection of personal data has motivated the main measures that have been adopted in Europe regarding the use of ChatGPT. In this regard, the Garante per la Protezione dei Dati Personali has ordered a precautionary measure to limit the processing of Italian citizens' data, the Spanish Data Protection Agency has initiated ex officio inspections of OpenAI as data controller and, with a supranational scope, the European Data Protection Supervisor (EDPB) has created a specific working group.
The impact of the regulation on open data and re-use
The Spanish regulation on open data and re-use of public sector information establishes some provisions that must be taken into account by IA systems. Thus, in general, re-use will be admissible if the data has been published without conditions or, in the event that conditions are set, when they comply with those established through licences or other legal instruments; although, when they are defined, the conditions must be objective, proportionate, non-discriminatory and justified by a public interest objective.
As regards the conditions for re-use of information provided by public sector bodies, the processing of such information is only allowed if the content is not altered and its meaning is not distorted, and the source of the data and the date of its most recent update must be mentioned.
On the other hand, high-value datasets are of particular interest for these AI systems characterised by the intense re-use of third-party content given the massive nature of the data processing they carry out and the immediacy of the requests for information made by users. Specifically, the conditions established by law for the provision of these high-value datasets by public bodies mean that there are very few limitations and also that their re-use is greatly facilitated by the fact that the data must be freely available, be susceptible to automated processing, be provided through APIs and be provided in the form of mass downloading, where appropriate.
In short, considering the particularities of this technology and, therefore, the very unique circumstances in which the data are processed, it seems appropriate that the licences and, in general, the conditions under which public entities allow their re-use be reviewed and, where appropriate, updated to meet the legal challenges that are beginning to arise.
Content prepared by Julián Valero, Professor at the University of Murcia and Coordinator of the "Innovation, Law and Technology" Research Group (iDerTec).
The contents and points of view reflected in this publication are the sole responsibility of the author.
The European Data Strategy envisages, among other measures, the implementation of a series of sectoral data spaces, in strategic areas and areas of particular public interest, with the aim of facilitating the "availability of large data repositories in these sectors, together with the necessary tools and technical infrastructures to use and exchange data, as well as appropriate governance mechanisms".
Specifically, according to the European Commission's working document on data spaces, they are promoted with the aim of "overcoming legal and technical barriers to data sharing, combining the necessary tools and infrastructures and addressing trust issues through common standards".
As recognised in the document, such spaces require not only the implementation of adequate infrastructures, but also the design of enabling governance frameworks, the latter of which poses significant challenges from a legal perspective. Although there is no normative definition of data spaces, according to the document, at EU level they are considered as ecosystems where data from the public sector, businesses and individuals, as well as research institutions and other types of organisations, are available and exchanged in a reliable and secure way.
Beyond the initiatives being promoted at European level and, specifically, their institutional and legal configuration, the creation of data spaces is also being promoted at state level in Spain, in particular by the Data Office, an administrative body directly under the Secretary of State for Digitalisation and Artificial Intelligence. In this respect, the Office's main functions include "the creation of spaces for data sharing between companies, citizens and public administrations in a secure and governed manner (sandboxes, national and European data spaces, data ecosystems for both public and private sector use, etc.)", as well as "the development of secure access mechanisms to these data platforms, for data-based public decision-making or for business use".
These spaces are set to play an essential role in the context of the Recovery, Transformation and Resilience Plan, particularly in the industrial sphere, one of whose main objectives is to facilitate the modernisation and productivity of the Spanish industry-services ecosystem through the digitisation of the value chain and, specifically, by boosting business innovation based on the intensive use of data. Among the main areas where the creation of these spaces, identified in the Digital Agenda 2025 and the aforementioned Plan, is planned, are important sectors such as agri-food, sustainable mobility, health, trade and tourism. In particular, their implementation will be carried out "through the development of use cases, demonstrators and pilots, and public-private sectoral innovation ecosystems around these data spaces".
The configuration of data spaces
Given the absence of a specific regulation on data spaces, their specific configuration will depend both on the singularities of the sector to which they refer and on the objectives pursued by their constitution. Nevertheless, we can start from a general characterisation that serves to delimit their main implications from a legal perspective.
- Thus, in the first place, each participant must retain control over the data contributed to the common space, which in principle implies the freedom to decide freely not only on their incorporation but even on their withdrawal, with the nuances that may derive from the existence of regulatory obligations in this respect, as may be the case with public entities. On the other hand, it is essential to ensure conditions of technological neutrality, so that there is no linkage to a specific tool or solution. This premise allows the space to move to other environments and use other infrastructures freely. In this respect, it is particularly important that the different spaces are built on the basis of parameters that allow for their interoperability, so that, if necessary, they can be interconnected and, if necessary, data migration between different infrastructures can be facilitated.
- Adequate conditions for access to data and for their subsequent use must also be guaranteed. Specifically, this requirement has important consequences from the perspective of the rules on free competition, so that, on the one hand, undue situations of prevalence and/or concentration in a specific market are not generated and, on the other hand, those cases of re-use of data that are illegal or, where appropriate, contrary to the principles and objectives that were previously established when the corresponding space was created, are avoided.
- Particularly important is the design of a governance model that precisely establishes the conditions for the participation of the various actors involved, in particular their rights and obligations, who will be responsible for adopting the decisions relating to the design of the space and its subsequent practical execution, also contemplating the mechanisms for the resolution of potential conflicts that may arise beyond the unquestionable judicial route that, in principle, would always be available.
Legal implications of data spaces
Since the approval of Directive (EU) 2019/1024 on open data and the re-use of public sector information, there have been important regulatory developments affecting data spaces, including Regulation (EU) 2022/868 on European data governance, which provides for a specific regime for intermediation services and altruism in the transfer of data.
Thus, recently, Implementing Regulation (EU) 2023/138 has been published, establishing the high-value datasets that public sector entities have to make available under technical and legal conditions that facilitate their re-use. Other initiatives of general scope are also in the pipeline that are set to have a major direct impact on data spaces, including the proposal for harmonised rules for fair access to and use of data (Data Act).
Beyond this transversal regulatory framework, it is necessary to distinguish those spaces that have a specific regulation from those that, on the contrary, do not, since in the latter case the determination of the applicable legal rules will have to be made using other non-regulatory legal instruments, i.e. mainly through the agreement - whether in the form of a contract, agreement, etc. - between the subjects that participate in the creation of the space and decide on its initial configuration.
It is also decisive whether a public sector entity is involved in the area, since, if so, it could join the area on an equal footing with the rest of the private parties or, as the case may be, adopt a management, control or supervisory role that would be incompatible with its participation under the first modality insofar as such a position could interfere with the normal functioning of the area. If this is the case, an appropriate functional and organisational separation should be envisaged, so that different entities would be in charge of carrying out both tasks, i.e. providing data to the site and using them and, on the other hand, managing the operation of the site.
On the other hand, it could be the case that there is a separate regulatory framework for the space in question, as is being considered at European level in the area of health data. In this case, it is the sectoral regulations themselves that would establish the conditions for participation in the area, which could even be compulsory; the technical, organisational, legal and economic premises applicable, both as regards the parties that provide the data and those that intend to re-use them; the assumptions or, where appropriate, the conditions under which the re-use of the information would not be admissible; or, among other things, the institutional guarantees to be taken into account and, above all, the organisational structures in charge of enforcing compliance with the regulatory provisions governing the corresponding area.
In short, sectoral spaces constitute a model that goes beyond the mere exchange of data between various subjects and that also goes beyond -although it may include, depending on the case- the re-use of public sector information. Specifically, these are ecosystems in which, in general, private entities are called upon to play an important role, which does not necessarily mean that the public sector is excluded from active participation. However, this type of initiative is highly complex, not only because of the configuration of the sectoral space itself but, above all, because of the ambitious approach involved in the future integration of several spaces, whether at national or, to an even greater extent, European level, which reinforces the importance of initiatives such as Gaia-X.
In the absence of a specific regulatory framework for data spaces, it is essential to establish the appropriate conditions for the design and implementation of these spaces to be carried out with the greatest legal guarantees, taking into account the ultimate objective pursued: to facilitate the creation of value-added digital services based on technological innovation.
Content prepared by Julián Valero, Professor at the University of Murcia and Coordinator of the "Innovation, Law and Technology" Research Group (iDerTec).
The contents and points of view reflected in this publication are the sole responsibility of the author.
The public sector in Spain will have the duty to guarantee the openness of its data by design and by default, as well as its reuse. This is the result of the amendment of Law 37/2007 on the reuse of public sector information in application of European Directive 2019/1024.
This new wording of the regulation seeks to broaden the scope of application of the Law in order to bring the legal guarantees and obligations closer to the current technological, social and economic context. In this scenario, the current regulation takes into account that greater availability of public sector data can contribute to the development of cutting-edge technologies such as artificial intelligence and all its applications.
Moreover, this initiative is aligned with the European Union's Data Strategy aimed at creating a single data market in which information flows freely between states and the private sector in a mutually beneficial exchange.
From high-value data to the responsible unit of information: obligations under Law 37/2007
In the following infographic, we highlight the main obligations contained in the consolidated text of the law. Emphasis is placed on duties such as promoting the opening of High Value Datasets (HVDS), i.e. datasets with a high potential to generate social, environmental and economic benefits. As required by law, HVDS must be published under an open data attribution licence (CC BY 4.0 or equivalent), in machine-readable format and accompanied by metadata describing the characteristics of the datasets. All of this will be publicly accessible and free of charge with the aim of encouraging technological, economic and social development, especially for SMEs.
In addition to the publication of high-value data, all public administrations will be obliged to have their own data catalogues that will interoperate with the National Catalogue following the NTI-RISP, with the aim of contributing to its enrichment. As in the case of HVDS, access to the datasets of the Public Administrations must be free of charge, with exceptions in the case of HVDS. As with HVDS, access to public authorities' datasets should be free of charge, except for exceptions where marginal costs resulting from data processing may apply.
To guarantee data governance, the law establishes the need to designate a unit responsible for information for each entity to coordinate the opening and re-use of data, and to be in charge of responding to citizens' requests and demands.
In short, Law 37/2007 has been modified with the aim of offering legal guarantees to the demands of competitiveness and innovation raised by technologies such as artificial intelligence or the internet of things, as well as to realities such as data spaces where open data is presented as a key element.
Click on the infographic to see it full size:

1. Introduction
Visualizations are graphical representations of data that allows comunication in a simple and effective way the information linked to it. The visualization possibilities are very wide, from basic representations, such as a graph of lines, bars or sectors, to visualizations configured on dashboards or interactive dashboards. Visualizations play a fundamental role in drawing conclusions using visual language, also allowing to detect patterns, trends, anomalous data or project predictions, among many other functions.
In this section of "Step-by-Step Visualizations" we are periodically presenting practical exercises of open data visualizations available in datos.gob.es or other similar catalogs. They address and describe in a simple way the necessary stages to obtain the data, perform the transformations and analysis that are relevant to it and finally, the creation of interactive visualizations. From these visualizations we can extract information to summarize in the final conclusions. In each of these practical exercises, simple and well-documented code developments are used, as well as free to use tools. All generated material is available for reuse in the Github data lab repository belonging to datos.gob.es.
In this practical exercise, we have carried out a simple code development that is conveniently documented based on free to use tool.
Access the data lab repository on Github.
Run the data pre-processing code on Google Colab.
2. Objetive
The main objective of this post is to show how to make an interactive visualization based on open data. For this practical exercise we have used a dataset provided by the Ministry of Justice that contains information about the toxicological results made after traffic accidents that we will cross with the data published by the Central Traffic Headquarters (DGT) that contain the detail on the fleet of vehicles registered in Spain.
From this data crossing we will analyze and be able to observe the ratios of positive toxicological results in relation to the fleet of registered vehicles.
It should be noted that the Ministry of Justice makes available to citizens various dashboards to view data on toxicological results in traffic accidents. The difference is that this practical exercise emphasizes the didactic part, we will show how to process the data and how to design and build the visualizations.
3. Resources
3.1. Datasets
For this case study, a dataset provided by the Ministry of Justice has been used, which contains information on the toxicological results carried out in traffic accidents. This dataset is in the following Github repository:
The datasets of the fleet of vehicles registered in Spain have also been used. These data sets are published by the Central Traffic Headquarters (DGT), an agency under the Ministry of the Interior. They are available on the following page of the datos.gob.es Data Catalog:
3.2. Tools
To carry out the data preprocessing tasks it has been used the Python programming language written on a Jupyter Notebook hosted in the Google Colab cloud service.
Google Colab (also called Google Colaboratory), is a free cloud service from Google Research that allows you to program, execute and share code written in Python or R from your browser, so it does not require the installation of any tool or configuration.
For the creation of the interactive visualization, the Google Data Studio tool has been used.
Google Data Studio is an online tool that allows you to make graphs, maps or tables that can be embedded in websites or exported as files. This tool is simple to use and allows multiple customization options.
If you want to know more about tools that can help you in the treatment and visualization of data, you can use the report "Data processing and visualization tools".
4. Data processing or preparation
Before launching to build an effective visualization, we must carry out a previous treatment of the data, paying special attention to obtaining it and validating its content, ensuring that it is in the appropriate and consistent format for processing and that it does not contain errors.
The processes that we describe below will be discussed in the Notebook that you can also run from Google Colab. Link to Google Colab notebook
As a first step of the process, it is necessary to perform an exploratory data analysis (EDA) in order to properly interpret the starting data, detect anomalies, missing data or errors that could affect the quality of subsequent processes and results. Pre-processing of data is essential to ensure that analyses or visualizations subsequently created from it are reliable and consistent. If you want to know more about this process, you can use the Practical Guide to Introduction to Exploratory Data Analysis.
The next step to take is the generation of the preprocessed data tables that we will use to generate the visualizations. To do this we will adjust the variables, cross data between both sets and filter or group as appropriate.
The steps followed in this data preprocessing are as follows:
- Importing libraries
- Loading data files to use
- Detection and processing of missing data (NAs)
- Modifying and adjusting variables
- Generating tables with preprocessed data for visualizations
- Storage of tables with preprocessed data
You will be able to reproduce this analysis since the source code is available in our GitHub account. The way to provide the code is through a document made on a Jupyter Notebook that once loaded into the development environment you can execute or modify easily. Due to the informative nature of this post and favor the understanding of non-specialized readers, the code does not intend to be the most efficient, but to facilitate its understanding, so you will possibly come up with many ways to optimize the proposed code to achieve similar purposes. We encourage you to do so!
5. Generating visualizations
Once we have done the preprocessing of the data, we go with the visualizations. For the realization of these interactive visualizations, the Google Data Studio tool has been used. Being an online tool, it is not necessary to have software installed to interact or generate any visualization, but it is necessary that the data tables that we provide are properly structured, for this we have made the previous steps for the preparation of the data.
The starting point is the approach of a series of questions that visualization will help us solve. We propose the following:
- How is the fleet of vehicles in Spain distributed by Autonomous Communities?
- What type of vehicle is involved to a greater and lesser extent in traffic accidents with positive toxicological results?
- Where are there more toxicological findings in traffic fatalities?
Let''s look for the answers by looking at the data!
5.1. Fleet of vehicles registered by Autonomous Communities
This visual representation has been made considering the number of vehicles registered in the different Autonomous Communities, breaking down the total by type of vehicle. The data, corresponding to the average of the month-to-month records of the years 2020 and 2021, are stored in the "parque_vehiculos.csv" table generated in the preprocessing of the starting data.
Through a choropleth map we can visualize which CCAAs are those that have a greater fleet of vehicles. The map is complemented by a ring graph that provides information on the percentages of the total for each Autonomous Community.
As defined in the "Data visualization guide of the Generalitat Catalana" the choropletic (or choropleth) maps show the values of a variable on a map by painting the areas of each affected region of a certain color. They are used when you want to find geographical patterns in the data that are categorized by zones or regions.
Ring charts, encompassed in pie charts, use a pie representation that shows how the data is distributed proportionally.
Once the visualization is obtained, through the drop-down tab, the option to filter by type of vehicle appears.
View full screen visualization
5.2. Ratio of positive toxicological results for different types of vehicles
This visual representation has been made considering the ratios of positive toxicological results by number of vehicles nationwide. We count as a positive result each time a subject tests positive in the analysis of each of the substances, that is, the same subject can count several times in the event that their results are positive for several substances. For this purpose, the table "resultados_vehiculos.csv” has been generated during data preprocessing.
Using a stacked bar chart, we can evaluate the ratios of positive toxicological results by number of vehicles for different substances and different types of vehicles.
As defined in the "Data visualization guide of the Generalitat Catalana" bar graphs are used when you want to compare the total value of the sum of the segments that make up each of the bars. At the same time, they offer insight into how large these segments are.
When stacked bars add up to 100%, meaning that each segmented bar occupies the height of the representation, the graph can be considered a graph that allows you to represent parts of a total.
The table provides the same information in a complementary way.
Once the visualization is obtained, through the drop-down tab, the option to filter by type of substance appears.
View full screen visualization
5.3. Ratio of positive toxicological results for the Autonomous Communities
This visual representation has been made taking into account the ratios of the positive toxicological results by the fleet of vehicles of each Autonomous Community. We count as a positive result each time a subject tests positive in the analysis of each of the substances, that is, the same subject can count several times in the event that their results are positive for several substances. For this purpose, the "resultados_ccaa.csv" table has been generated during data preprocessing.
It should be noted that the Autonomous Community of registration of the vehicle does not have to coincide with the Autonomous Community where the accident has been registered, however, since this is a didactic exercise and it is assumed that in most cases they coincide, it has been decided to start from the basis that both coincide.
Through a choropleth map we can visualize which CCAAs are the ones with the highest ratios. To the information provided in the first visualization on this type of graph, we must add the following.
As defined in the "Data Visualization Guide for Local Entities" one of the requirements for choropleth maps is to use a numerical measure or datum, a categorical datum for the territory, and a polygon geographic datum.
The table and bar chart provides the same information in a complementary way.
Once the visualization is obtained, through the peeling tab, the option to filter by type of substance appears.
View full screen visualization
6. Conclusions of the study
Data visualization is one of the most powerful mechanisms for exploiting and analyzing the implicit meaning of data, regardless of the type of data and the degree of technological knowledge of the user. Visualizations allow us to build meaning on top of data and create narratives based on graphical representation. In the set of graphical representations of data that we have just implemented, the following can be observed:
- The fleet of vehicles of the Autonomous Communities of Andalusia, Catalonia and Madrid corresponds to about 50% of the country''s total.
- The highest positive toxicological results ratios occur in motorcycles, being of the order of three times higher than the next ratio, passenger cars, for most substances.
- The lowest positive toxicology result ratios occur in trucks.
- Two-wheeled vehicles (motorcycles and mopeds) have higher "cannabis" ratios than those obtained in "cocaine", while four-wheeled vehicles (cars, vans and trucks) have higher "cocaine" ratios than those obtained in "cannabis"
- The Autonomous Community where the ratio for the total of substances is highest is La Rioja.
It should be noted that in the visualizations you have the option to filter by type of vehicle and type of substance. We encourage you to do so to draw more specific conclusions about the specific information you''re most interested in.
We hope that this step-by-step visualization has been useful for learning some very common techniques in the treatment and representation of open data. We will return to show you new reuses. See you soon!
Over the past year, the academic section of data.europa.eu expanded its open data training offer by publishing new conferences, courses and workshops. Thus, data.europa.academy shared a total of 15 webinars related to open data, data spaces and other topics and technical issues around the data economy.
In line with the online training philosophy of this area of expertise, professionals and users interested in open data were able to attend the conferences from anywhere in the EU by filling in a web-based registration form.
Among the webinars of the recently concluded 2022 were workshops and seminars on open data quality and metadata, the legal and technical perspective of open data openness, the potential of open data in real time or the opportunities it offers to citizens when developing solutions and services.
In this way, the range of content is very broad in terms of subject matter and level of technical accessibility, which makes it easy to filter the webinars according to interests. In addition, as many of the training sessions are based on reports previously published by the European data portal, they have very useful supporting documentation to complete the knowledge acquired.
In order to bring together this valuable source of knowledge in an orderly fashion, below you can access the 15 lectures published over the past year, as well as their respective supporting presentations.
Data quality and metadata
- Description: This webinar focuses on explaining why high quality data and metadata are the basis for beneficial production outcomes and for fostering informed decision making.
- Viewing link: https://www.youtube.com/watch?v=PcyJX8xbyik
Best practices of open data: the case of Estonia, Slovenia and Ukraine
- Description: Through this conference, the European portal tries to explain the importance and impact that the reuse of open data can have. To do so, they use the presentation of good practices and use cases of several European portals based on this type of data.
- Link to viewing: https://www.youtube.com/watch?v=mTVayKTUC-s
Real-time data
- Description: This course explains what real-time data is and which standards and technologies are most commonly used with this type of data.
- Link to viewer: https://www.youtube.com/watch?v=yl4ZotQQfuk
Demand and reuse of data in the public sector
- Description: This webinar provides an introduction to the re-use of data by public institutions, while focusing on the importance of meeting and measuring the demand for data by this specific user group.
- Viewing link: https://www.youtube.com/watch?v=uTd7Ti0aQNA&t=752s
Opportunities and challenges of citizen-generated data.
- Description: This seminar explores how citizen-generated data is currently available in open data portals of different levels of public administrations in Europe.
- Link to viewing: https://www.youtube.com/watch?v=4FHaerYTFmc&t=1801s
The role of data.europa.eu in the context of EU data spaces
- Description: This webinar enables data providers to understand how they can make better use of different infrastructures and thus provide more visibility to open data assets by assessing the role of data.europa.eu in contexts of common European data spaces.
- Link to view: https://www.youtube.com/watch?v=DjhGkGMoKso
Eurostat's regional yearbook goes digital
- Description: This is a conference dedicated to the evolution of Eurostat's regional yearbook from a printed publication to a digital publication that functions as a modern interactive tool.
- Viewing link: https://www.youtube.com/watch?v=q0mgg4IbXUY
Data.europa.eu - The official European data portal (webinar for data providers)
- Description: This webinar provides an overview of data.europa.eu, a portal that acts as a gateway to public sector information on different open data portals of EU institutions, agencies and bodies and national and international organisations around the world. The training provides an overview of the services provided through the portal.
- Link to view: https://www.youtube.com/watch?v=4s9Yol8GsSc
Measuring the impact of open data in Europe.
- Description: The aim of this conference is to provide an overview of the methods to assess the impact of open data. After a short introduction, guest speakers from the national open data teams of Poland and France presented real examples of how they measure the impact of open data in these countries.
- Link to viewing: https://www.youtube.com/watch?v=Cp7-qSNLR1U
Data visualisation
- Description: To highlight the potential behind data visualisations, through this webinar, and additional training materials, users can learn how to get the most out of open data catalogues through different ways of visualising them.
- Viewing link: https://www.youtube.com/watch?v=XY91H9TcO1A
- Supporting documents: https://data.europa.eu/en/academy/data-visualisation
Use Case Observatory Stories - Volume I
- Description: This webinar is part of a series of three sessions dedicated to the research project "Use Case Observatory" and its publications. In the first part of this training, an overview of the project, its methodology and the findings of the publication in 2022 are given. During the second part of the webinar, four of the managers of the thirty reuse cases participating in the research take the floor to present their open data solutions.
- Viewing link: https://www.youtube.com/watch?v=-FT0OxfgF0M
Trends in Geospatial Data
- Description: This seminar focuses on emerging trends in the geospatial community and how these along with standards and new ideas can be relevant to data.europa.eu.
- Link to view: https://www.youtube.com/watch?v=Hyt1MNm9l00
Federation of geospatial data on data.europa.eu
- Description: This training aims to present the geospatial data that can be found on data.europa.eu, as well as to explain the process of federating this type of data. The speakers took a close look at a geospatial dataset on data.europa.eu and explored the journey of its metadata from the source geo-catalogue to the portal.
- Link to viewing: https://www.youtube.com/watch?v=7UPneA4QOoo
Understanding open data from the perspective of legal openness (webinar for data providers)
- Description: This webinar aims to explain and discuss what openness means from a legal perspective and how it can best be achieved. The aim is not to provide purely theoretical legal training, but to identify best practices and resources that data providers can use to achieve openness and to realise when openness cannot be achieved.
- Link to viewing: https://www.youtube.com/watch?v=53QdDf4LJN0&t=1s
Understanding the technical openness of open data (webinar for data providers)
- Description: The aim of this training is to guide data providers through the principle of technical openness and the data management process of moving from closed to open data formats. An open format is one in which the programme specifications are freely available to anyone, free of charge and without limitations on re-use imposed by intellectual property rights.
- Viewing link: https://www.youtube.com/watch?v=cQMwMXd4n9I&t=17s
For the new year that is already underway, data.europa.eu aims to continue to expand the training resources of its academic section with the programming of seminars such as Data and Competition Law or another linked to the recent publication of the Open Data Maturity 2022 report.
For more information on future seminars, follow the link below to the European open data portal and stay tuned for news on this topic from datos.gob.es.
Data science has a key role to play in building a more equitable, fair and inclusive world. Open data related to justice and society can serve as the basis for the development of technological solutions that drive a legal system that is not only more transparent, but also more efficient, helping lawyers to do their work in a more agile and accurate way. This is what is known as LegalTech, and includes tools that make it possible to locate information in large volumes of legal texts, perform predictive analyses or resolve legal disputes easily, among other things.
In addition, this type of data drives the development of solutions aimed at responding to the great social challenges facing humanity, helping to promote the common good, such as the inclusion of certain groups, aid for refugees and people in conflict zones or the fight against gender-based violence.
When we talk about open data related to justice and society, we refer both to legal data and to other data that can have an impact on universalising access to basic services, achieving equity, ensuring that all people have the same opportunities for development and promoting collaboration between different social agents.
What types of data on justice and society can I find in datos.gob.es?
On our portal you can access a wide catalogue of data that is classified by different sectors. The Legislation and Justice category currently has more than 5,000 datasets of different types, including information related to criminal offences, appeals or victims of certain crimes, among others. For its part, the Society and Welfare category has more than 8,000 datasets, including, for example, lists of aid, associations or information on unemployment.
Of all these datasets, here are some examples of the most outstanding ones, together with the format in which you can consult them:
At state level
- Spanish Statistical Office (INE). Offences according to sex by Autonomous Communities and cities. CSV, XLSX, XLS, JSON, PC-Axis, HTML (landing page for data download)
- Spanish Statistical Office (INE). 2030 Agenda SDG - Population at risk of poverty or social exclusion: AROPE indicator. CSV, XLS, XLSX, HTML (landing page for data download)
- Spanish Statistical Office (INE). Internet use by demographic characteristics and frequency of use. CSV, XLSX, XLS, JSON, PC-Axis, HTML (landing page for data download)
- Spanish Statistical Office (INE). Average expenditure according to size of the municipality of residence. CSV, XLSX, XLS, JSON, PC-Axis, HTML (landing page for data download)
- Spanish Statistical Office (INE). Retirement age in access to Benefit. CSV, XLSX
- Ministry of Justice. Judicial Census. XLSX, PDF, HTML (landing page for data download
At Autonomous Community level
- Cantabrian Institute of Statistics. Statistics on annulments, separations and divorces. RDF-XML, XLS, JSON, ZIP, PC-Axis, HTML (landing page for data download).
- Basque Government. Standards and laws in force applicable in the Basque Country. JSON, JSON-P, XML, XLSX.
- Basque Government. Locating mass graves from the Civil War and Francoism. CSV, XLS, XML.
- Generalitat Catalana. Minstry of Justice resources statistics. XLSX, HTML (landing page for data download).
- Government of Catalonia. Youth justice statistics. XLSX, HTML (landing page de descarga de datos).
- Autonomous Community of Navarre. Statistics on Transfer of Property Rights. XLSX, HTML (landing page for data download).
- Principality of Asturias. Sustainable Development Goals indicators in Asturias. HTML, XLSX, ZIP.
- Principality of Asturias. Justice in Asturias: staffing levels of the judicial bodies of the Principality of Asturias according to type. HTML (landing page for data download).
- Cantabrian Institute of Statistics. Judges and magistrates active in the Canary Islands. HTML, JSON, PC-Axis.
A the local level
- Santa Cruz de Tenerife City Council. Parking spaces for people with reduced mobility. SHP, KML, KMZ, RDF-XML, CSV, JSON, XLS
- Madrid City Council. Justice Administration Offices in the city of Madrid. CSV, XML, RSS, RDF-XML, JSON, HTML (landing page for data download)
- Gijón City Council. Security forces. JSON, CSV, XLS, PDF, HTML, TSV, texto, XML, HTML (landing page for data download)
- Madrid City Council. Child and Family Care Centres. CSV, JSON, RDF-XML, XML, RSS, HTML (landing page for data download).
- Zaragoza City Council. List of police stations. CSV, JSON.
Some examples of re-use of justice and social good related data
In the companies and applications section of datos.gob.es you can find some examples of solutions developed with open data related to justice and social good. One example is Papelea, a company that provides answers to users' legal and administrative questions. To this end, it draws on public information such as administrative procedures of the main administrations, legal regulations, jurisprudence, etc. Another example is the ISEAK Foundation, which specialises in the evaluation of public policies on employment, inequality, inclusion and gender, using public data sources such as the National Institute of Statistics, Social Security, Eurostat and Opendata Euskadi.
Internationally, there are also examples of initiatives created to monitor procedural cases or improve the transparency of police services. In Europe, there is a boom in the creation of companies focused on legal technology that seek to improve the daily life of citizens, as well as initiatives that seek to use data for equity. Concrete examples of solutions in this area are miHub for asylum seekers and refugees in Cyprus, or Surviving in Brussels, a website for the homeless and people in need of access to services such as medical help, housing, job offers, legal help or financial advice.
Do you know of a company that uses this kind of data or an application that relies on it to contribute to the advancement of society? Then do not hesitate to leave us a comment with all the information or send us an email to dinamizacion@datos.gob.es.
