The Open Data Maturity Study 2022 provides a snapshot of the level of development of policies promoting open data in countries, as well as an assessment of the expected impact of these policies. Among its findings, it highlights that measuring the impact of open data is a priority, but also a major challenge across Europe.
In this edition, there has been a 7% decrease in the average maturity level in the impact dimension for EU27 countries, which coincides with the restructuring of the impact dimension indicators. However, it is not so much a decrease in the level of maturity, but a more accurate picture of the difficulty in assessing the resulting impact of reuse of open data difficulty in assessing the impact resulting from the re-use of open data.
Therefore, in order to better understand how to make progress on the challenge of measuring the impact of open data, we have looked at existing best practices for measuring the impact of open data in Europe. To achieve this objective, we have worked with the data provided by the countries in their responses to the survey questionnaire and in particular with those of the eleven countries that have scored more than 500 points in the Impact dimension, regardless of their overall score and their position in the ranking: France, Ireland, Cyprus, Estonia and the Czech Republic scoring the maximum 600 points; and Poland, Spain, Italy, Denmark and Sweden scoring above 510 points.
In the report we provide a country profile for each of the ten countries, analysing in general terms the country's performance in all dimensions of the study and in detail the different components of the impact dimension, summarising the practices that have led to its high score based on the analysis of the responses to the questionnaire.
Through this tabbed structure the document allows for a direct comparison between country indicators and provides a detailed overview of best practices and challenges in the use of open data in terms of measuring impact through the following indicators:
- "Strategic awareness": It quantifies the awareness and preparedness of countries to understand the level of reuse and impact of open data within their territory.
- "Measuring reuse": It focuses on how countries measure open data re-use and what methods they use.
-
"Impact created": It collects data on the impact created within four impact areas: government impact (formerly policy impact), social impact, environmental impact and economic impact.
Finally, the report provides a comparative analysis of these countries and draws out a series of recommendations and good practices that aim to provide ideas on how to improve the impact of open data on each of the three indicators measured in the study.
If you want to know more about the content of this report, you can watch the interview with its author interview with its author.
Below, you can download the full report, the executive summary and a presentation-summary.
Content prepared by Jose Luis Marín, Senior Consultant in Data, Strategy, Innovation & Digitalization.
The contents and views expressed in this publication are the sole responsibility of the author.
The Multisectorial Association of Information (ASEDIE), which brings together the infomediary companies of our country, once again includes among its annual objectives the promotion of the reuse of public and private information. Thus, and almost in parallel to the beginning of the new year, last December, ASEDIE shared the progress that the top 3 has experienced in most of the autonomous communities, and the good expectations for the second edition.
Since this initiative was launched last 2019 to promote the opening of three datasets by the autonomous communities, they have been gradually opening datasets that have improved access to information sources, while helping to boost the development of services and applications based on open data. The objective of this project, which in 2021 was included as a commitment to Best Practices in the Observatory of the IV Open Government Plan and supported by the seventeen Autonomous Communities, is to harmonize the opening of Public Sector databases with the aim of encouraging their reuse, promoting the development of the data economy.
First edition: accessible in fifteen autonomous communities
The first edition of Asedie's Top 3 was a success not only because of the datasets selected, but also because of the openness rate achieved four years later. Currently, fifteen of the country's seventeen autonomous communities have managed to open all three databases to the general public: cooperatives, foundations and associations.
2023: the year to complete the opening of the second edition
With the aim of continuing to promote the opening of public information in the different autonomous communities, in 2020, ASEDIE launched a new edition of the top 3 so that those communities that had already overcome the previous challenge could continue to make progress. Thus, for this second edition the selected databases were the following:
- Energy Efficiency Certificates
- Industrial Estates
- Agricultural Transformation Companie
As a result, the second edition of the top 3 is now accessible in seven autonomous communities. Moreover, the databases related to energy efficiency certificates, an increasingly required information at European level, are now openly available in all the autonomous communities of the Spanish geography.

Next steps: extending the commitment to open data
As it could not be otherwise, one of ASEDIE's main annual objectives is to continue promoting regional collaboration in order to complete the opening of the second edition of the top 3 in the rest of the autonomous communities. In parallel, the next edition of the ASEDIE Report will be made public on March 22, taking advantage of the Open Administration Week. As on other occasions, this document will serve to take stock of the milestones achieved in the previous year, as well as to list the new challenges.
In fact, in relation to open data, the ASEDIE report is a very useful tool when it comes to broadening knowledge in this area of expertise, as it includes a list of successful cases of infomediary companies and examples of the products and services they produce.
In short, thanks to initiatives such as those developed by ASEDIE, public-private collaboration is becoming more and more constant and tangible, making it easier for companies to reuse public information.
16.5 billion euros. These are the revenues that artificial intelligence (AI) and data are expected to generate in Spanish industry by 2025, according to what was announced last February at the IndesIA forum, the association for the application of artificial intelligence in industry. AI is already part of our daily lives: either by making our work easier by performing routine and repetitive tasks, or by complementing human capabilities in various fields through machine learning models that facilitate, for example, image recognition, machine translation or the prediction of medical diagnoses. All of these activities help us to improve the efficiency of businesses and services, driving more accurate decision-making.
But for machine learning models to work properly, they need quality and well-documented data. Every machine learning model is trained and evaluated with data. The characteristics of these datasets condition the behaviour of the model. For example, if the training data reflects unwanted social biases, these are likely to be incorporated into the model as well, which can have serious consequences when used in high-profile areas such as criminal justice, recruitment or credit lending. Moreover, if we do not know the context of the data, our model may not work properly, as its construction process has not taken into account the intrinsic characteristics of the data on which it is based.
For these and other reasons, the World Economic Forum suggests that all entities should document the provenance, creation and use of machine learning datasets in order to avoid erroneous or discriminatory results..
What are datasheets for datasets?
One mechanism for documenting this information is known as Datasheets for datasets. This framework proposes that every dataset should be accompanied by a datasheet, which consists of a questionnaire that guides data documentation and reflection throughout the data lifecycle. Some of the benefits are:
- It improves collaboration, transparency and accountability within the machine learning community.
- Mitigates unwanted social biases in models.
- Helps researchers and developers select the most appropriate datasets to achieve their specific goals.
- Facilitates greater reproducibility of results.
Datasheets will vary depending on factors such as knowledge area, existing organisational infrastructure or workflows.
To assist in the creation of datasheets, a questionnaire has been designed with a series of questions, according to the stages of the data lifecycle:
- Motivation. Collects the reasons that led to the creation of the datasets. It also asks who created or funded the datasets.
- Composition. Provides users with the necessary information on the suitability of the dataset for their purposes. It includes, among other questions, which units of observation represent the dataset (documents, photos, persons, countries), what kind of information each unit provides or whether there are errors, sources of noise or redundancies in the dataset. Reflect on data that refer to individuals to avoid possible social biases or privacy violations.
- Collection process. It is intended to help researchers and users think about how to create alternative datasets with similar characteristics. It details, for example, how the data were acquired, who was involved in the collection process, or what the ethical review process was like. It deals especially with the ethical aspects of processing data protected by the GDPR.
- Preprocessing, cleansing or tagging. These questions allow data users to determine whether data have been processed in ways that are compatible with their intended uses. Inquire whether any preprocessing, cleansing or tagging of the data was performed, or whether the software that was used to preprocess, cleanse and tag the data is available.
- Uses. This section provides information on those tasks for which the data may or may not be used. For this, questions such as: Has the dataset already been used for any task? What other tasks can it be used for? Does the composition of the dataset or the way it was collected, preprocessed, cleaned and labelled affect other future uses?
- Distribution. This covers how the dataset will be disseminated. Questions focus on whether the data will be distributed to third parties and, if so, how, when, what are the restrictions on use and under what licences.
- Maintenance. The questionnaire ends with questions aimed at planning the maintenance of the data and communicating the plan to the users of the data. For example, answers are given to whether the dataset will be updated or who will provide support.
It is recommended that all questions are considered prior to data collection, so that data creators can be aware of potential problems. To illustrate how each of these questions could be answered in practice, the model developers have produced an appendix with an example for a given dataset.

Is Datasheets for datasets effective?
The Datasheets for datasets data documentation framework has initially received good reviews, but its implementation continues to face challenges, especially when working with dynamic data.
To find out whether the framework effectively addresses the documentation needs of data creators and users, in June 2022, Microsoft USA and the University of Michigan conducted a study on its implementation. To do so, they conducted a series of interviews and a follow-up on the implementation of the questionnaire by a number of machine learning professionals.
In summary, participants expressed the need for documentation frameworks to be adaptable to different contexts, to be integrated into existing tools and workflows, and to be as automated as possible, partly due to the length of the questions. However, they also highlighted its advantages, such as reducing the risk of information loss, promoting collaboration between all those involved in the data lifecycle, facilitating data discovery and fostering critical thinking, among others.
In short, this is a good starting point, but it will have to evolve, especially to adapt to the needs of dynamic data and documentation flows applied in different contexts.
Content prepared by the datos.gob.es team.
When publishing open data, it is essential to ensure its quality. If data is well documented and of the required quality, it will be easier to reuse, as there will be less additional work for cleaning and processing. In addition, poor data quality can be costly for publishers, who may spend more money on fixing errors than on avoiding potential problems in advance.
To help in this task, the Aporta Initiative has developed the "Practical guide for improving the quality of open data", which provides a compendium of guidelines for acting on each of the characteristics that define quality, driving its improvement. The document takes as a reference the data.europe.eu data quality guide, published in 2021 by the Publications Office of the European Union.
Who is the guide aimed at?
The guide is aimed at open data publishers, providing them with clear guidelines on how to improve the quality of their data.
However, this collection can also provide guidance to data re-users on how to address the quality weaknesses that may be present in the datasets they work with.
What does the guide include?
The document begins by defining the characteristics, according to ISO/IEC 25012, that data must meet in order to be considered quality data, which are shown in the following image

Next, the bulk of the guide focuses on the description of recommendations and good practices to avoid the most common problems that usually arise when publishing open data, structured as follows:
- A first part where a series of general guidelines are detailed to guarantee the quality of open data, such as, for example, using a standardised character encoding, avoiding duplicity of records or incorporating variables with geographic information. For each guideline, a detailed description of the problem, the quality characteristics affected and recommendations for their resolution are provided, together with practical examples to facilitate understanding.
- A second part with specific guidelines for ensuring the quality of open data according to the data format used. Specific guidelines are included for CSV, XML, JSON, RDF and APIs.
- Finally, the guide also includes recommendations for data standardisation and enrichment, as well as for data documentation, and a list of useful tools for working on data quality.
You can download the guide here or at the bottom of the page (only available in Spanish).
Additional materials
The guide is accompanied by a series of infographics that compile the above guidelines:
- Infographic "General guidelines for quality assurance of open data".
- Infographic "Guidelines for quality assurance using specific data formats”.
Nowadays we can find a great deal of legislative information on the web. Countries, regions and municipalities make their regulatory and legal texts public through various spaces and official bulletins. The use of this information can be of great use in driving improvements in the sector: from facilitating the location of legal information to the development of chatbots capable of resolving citizens' legal queries.
However, locating, accessing and reusing these documents is often complex, due to differences in legal systems, languages and the different technical systems used to store and manage the data.
To address this challenge, the European Union has a standard for identifying and describing legislation called the European Legislation Identifier (ELI).
What is the European Legislation Identifier?
The ELI emerged in 2012 through Council Conclusions (2012/C 325/02) in which the European Union invited Member States to adopt a standard for the identification and description of legal documents. This initiative has been further developed and enriched by new conclusions published in 2017 (2017/C 441/05) and 2019 (2019/C 360/01).
The ELI, which is based on a voluntary agreement between EU countries, aims to facilitate access, sharing and interconnection of legal information published in national, European and global systems. This facilitates their availability as open datasets, fostering their re-use.
Specifically, the ELI allows:
- Identify legislative documents, such as regulations or legal resources, uniquely by means of a unique identifier (URI), understandable by both humans and machines.
- Define the characteristics of each document through automatically processable metadata. To this end, it uses vocabularies defined by means of ontologies agreed and recommended for each field.
Thanks to this, a series of advantages are achieved:
- It provides higher quality and reliability.
- It increases efficiency in information flows, reducing time and saving costs.
- It optimises and speeds up access to legislation from different legal systems by providing information in a uniform manner.
- It improves the interoperability of legal systems, facilitating cooperation between countries.
- Facilitates the re-use of legal data as a basis for new value-added services and products that improve the efficiency of the sector.
- It boosts transparency and accountability of Member States.
Implementation of the ELI in Spain
The ELI is a flexible system that must be adapted to the peculiarities of each territory. In the case of the Spanish legal system, there are various legal and technical aspects that condition its implementation.
One of the main conditioning factors is the plurality of issuers, with regulations at national, regional and local level, each of which has its own means of official publication. In addition, each body publishes documents in the formats it considers appropriate (pdf, html, xml, etc.) and with different metadata. To this must be added linguistic plurality, whereby each bulletin is published in the official languages concerned.
It was therefore agreed that the implementation of the ELI would be carried out in a coordinated manner by all administrations, within the framework of the Sectoral Commission for e-Government (CSAE), in two phases:
- Due to the complexity of local regulations, in the first phase, it was decided to address only the technical specification applicable to the State and the Autonomous Communities, by agreement of the CSAE of 13 March 2018.
- In February 2022, a new version was drafted to include local regulations in its application.
With this new specification, the common guidelines for the implementation of the ELI in the Spanish context are established, but respecting the particularities of each body. In other words, it only includes the minimum elements necessary to guarantee the interoperability of the legal information published at all levels of administration, but each body is still allowed to maintain its own official journals, databases, internal processes, etc.
With regard to the temporal scope, bodies have to apply these specifications in the following way:
- State regulations: apply to those published from 29/12/1978, as well as those published before if they have a consolidated version.
- Autonomous Community legislation: applies to legislation published on or after 29/12/1978.
- Local regulations: each entity may apply its own criteria.
How to implement the ELI?
The website https://www.elidata.es/ offers technical resources for the application of the identifier. It explains the contextual model and provides different templates to facilitate its implementation:
It also offers the list of common minimum metadata, among other resources.
In addition, to facilitate national coordination and the sharing of experiences, information on the implementation carried out by the different administrations can also be found on the website.
The ELI is already applied, for example, in the Official State Gazette (BOE). From its website it is possible to access all the regulations in the BOE identified with ELI, distinguishing between state and autonomous community regulations. If we take as a reference a regulation such as Royal Decree-Law 24/2021, which transposed several European directives (including the one on open data and reuse of public sector information), we can see that it includes an ELI permalink.
In short, we are faced with a very useful common mechanism to facilitate the interoperability of legal information, which can promote its reuse not only at a national level, but also at a European level, favouring the creation of the European Union's area of freedom, security and justice.
Content prepared by the datos.gob.es team.
A data space is an ecosystem where, on a voluntary basis, the data of its participants (public sector, large and small technology or business companies, individuals, research organizations, etc.) are pooled. Thus, under a context of sovereignty, trust and security, products or services can be shared, consumed and designed from these data spaces.
This is especially important because if the user feels that he has control over his own data, thanks to clear and concise communication about the terms and conditions that will mark its use, the sharing of such data will become effective, thus promoting the economic and social development of the environment.
In line with this idea and with the aim of improving the design of data spaces, the Data Office establishes a series of characteristics whose objective is to record the regulations that must be followed to design, from an architectural point of view, efficient and functional data spaces.
We summarize in the following visual some of the most important characteristics for the creation of data spaces. To consult the original document and all the standards proposed by the Data Office, please download the attached document at the end of this article.
(You can download the accessible version in word here)

This report published by the European Data Portal (EDP) aims to advance the debate on the medium and long-term sustainability of open data portal infrastructures.
It provides recommendations to open data publishers and data publishers on how to make open data available and how to promote its reuse. It is based on the previous work done by the data.europa.eu team, on research on open data management, and on the interaction between humans and data.
Considering the conclusions, 10 recommendations are proposed for increasing the reuse of data.
The report is available at this link: " Principles and recommendations to make data.europa.eu data more reusable: A strategy mapping report "
One of the key actions that we recently highlighted as necessary to build the future of open data in our country is the implementation of processes to improve data management and governance. It is no coincidence that proper data management in our organisations is becoming an increasingly complex and in-demand task. Data governance specialists, for example, are increasingly in demand - with more than 45,000 active job openings in the US for a role that was virtually non-existent not so long ago - and dozens of data management platforms now advertise themselves as data governance platforms.
But what's really behind these buzzwords - what is it that we really mean by data governance? In reality, what we are talking about is a series of quite complex transformation processes that affect the whole organisation.
This complexity is perfectly reflected in the framework proposed by the Open Data Policy Lab, where we can clearly see the different overlapping layers of the model and what their main characteristics are - leading to a journey through the elaboration of data, collaboration with data as the main tool, knowledge generation, the establishment of the necessary enabling conditions and the creation of added value.

Let's now peel the onion and take a closer look at what we will find in each of these layers:
The data lifecycle
We should never consider data as isolated elements, but as part of a larger ecosystem, which is embedded in a continuous cycle with the following phases:
- Collection or collation of data from different sources.
- Processing and transformation of data to make it usable.
- Sharing and exchange of data between different members of the organisation.
- Analysis to extract the knowledge being sought.
- Using data according to the knowledge obtained.
Collaboration through data
It is not uncommon for the life cycle of data to take place solely within the organisation where it originates. However, we can increase the value of that data exponentially, simply by exposing it to collaboration with other organisations through a variety of mechanisms, thus adding a new layer of management:
- Public interfaces that provide selective access to data, enabling new uses and functions.
- Trusted intermediaries that function as independent data brokers. These brokers coordinate the use of data by third parties, ensuring its security and integrity at all times.
- Data pooling that provide a common, joint, complete and coherent view of data by aggregating portions from different sources.
- Research and analysis partnership, granting access to certain data for the purpose of generating specific knowledge.
- Prizes and challenges that give access to specific data for a limited period of time to promote new innovative uses of data.
- Intelligence generation, whereby the knowledge acquired by the organisation through the data is also shared and not just the raw material.
Insight generation
Thanks to the collaborations established in the previous layer, it will be possible to carry out new studies of the data that will allow us both to analyse the past and to try to extrapolate the future using various techniques such as:
- Situational analysis, knowing what is happening in the data environment.
- Cause and effect insigths, looking for an explanation of the origin of what is happening.
- Prediction, trying to infer what will happen next.
- Impact assessment, establishing what we expect should happen.
Enabling conditions
There are a number of procedures that when applied on top of an existing collaborative data ecosystem can lead to even more effective use of data through techniques such as:
- Publish with a purpose, with the aim of coordinating data supply and demand as efficiently as possible.
- Foster partnerships, including in our analysis those groups of people and organisations that can help us better understand real needs.
- Prioritize subnational efforts, strengthening of alternative data sources by providing the necessary resources to create new data sources in untapped areas.
- Center data responsability, establishing an accountability framework around data that takes into account the principles of fairness, engagement and transparency.
Value generation
Scaling up the ecosystem -and establishing the right conditions for that ecosystem to flourish- can lead to data economies of scale from which we can derive new benefits such as:
- Improving governance and operations of the organisation itself through the overall improvements in transparency and efficiency that accompany openness processes.
- Empowering people by providing them with the tools they need to perform their tasks in the most appropriate way and make the right decisions.
- Creating new opportunities for innovation, the creation of new business models and evidence-led policy making.
- Solving problems by optimising processes and services and interventions within the system in which we operate.
As we can see, the concept of data governance is actually much broader and more complex than one might initially expect and encompasses a number of key actions and tasks that in most organisations it will be practically impossible to try to centralise in a single role or through a single tool. Therefore, when establishing a data governance system in an organisation, we should face the challenge as an integral transformation process or a paradigm shift in which practically all members of the organisation should be involved to a greater or lesser extent. A good way to face this challenge with greater ease and better guarantees would be through the adoption and implementation of some of the frameworks and reference standards that have been created in this respect and that correspond to different parts of this model.
Content prepared by Carlos Iglesias, Open data Researcher and consultant, World Wide Web Foundation.
The contents and views expressed in this publication are the sole responsibility of the author.
The open data portal of Aragon emerged in 2012 and has not stopped growing since then. It currently has more than 2,100 datasets and a large number of applications. During these years it has incorporated new features to adapt to the real needs of citizens, such as its information structure that improves interoperability and homogenises the available data, or the incorporation of applications such as Open Analytics Data, which offers statistics related to the use of the most important portals of the Government of Aragon.
In the last few months, they have been working on the Aragon Open Data Focus Initiative, aimed at getting to know open data publishers and users better. To find out more about this interesting project and the rest of the activities they are developing, we have spoken to Julián Moyano, Technical Advisor of the General Directorate of Electronic Administration and Information Society, Department of Science, University and Knowledge Society of the Government of Aragon.
Full interview
1. What is Aragon Open Data Focus and what are its strategic points?
Aragon Open Data Focus is a way of bringing the data of the Government of Aragon's open data portal closer to society, and to those people who are not so familiar with the data, in order to encourage their use and interpretation.
Bringing together the data available in Aragon Open Data has required a better understanding of the real needs of the users and groups involved. These are the four strategic points of this work:
- Firstly, we have started with an initial analysis of the data and services available in Aragon Open Data.
- Second, through this analysis we have defined potential groups of users and agents of interest.
- Thirdly, from this point onwards, different meetings have been organised with these groups to look for synergies and establish lines of work.
- Fourthly, all this has resulted in the service called Aragon Open Data Focus with digital stories and narratives based on available open data and the concerns of the users.
Aragón Open Data Focus es una manera de acercar los datos del portal de datos abiertos del Gobierno de Aragón, Aragón Open Data, a la sociedad, y a aquellas personas que no están tan familiarizadas con los datos para favorecer su uso e interpretación.
2. To learn more about the users’ needs, you have held various virtual meetings. What groups have you met with and what conclusions have you drawn from these conversations?
The meetings have been a very important part of Aragon Open Data Focus. At the beginning of 2020, 8 meetings had been planned in person, to encourage participation and direct contact with these agents involved. Due to the coronavirus pandemic, the first of those meetings had to be suspended, rescheduling the agenda of participants and calendar, to be held by videoconference. There has been a great deal of online activity and it has been very well received by the different groups of participants. The groups we have worked with have been:
- Public sector organisations: focused on companies and other public sector entities.
- Storytellers: journalists.
- Companies that reuse data.
- Students.
- Directors, managers and senior executives of private and public organisations.
- Developers and programmers from the technology sector.
- Auditors of public action, citizens' groups and social movements.
- Citizens, in general, new to open data.
The conclusions of all these meetings have been very valuable. The first of these is that it is necessary to talk and debate "one-on-one" with the agents involved, with the recipients of the services, with the possible and potential users of the data, in order to know their needs much better and share them in Aragon Open Data.
Some of the conclusions I would like to highlight that were obtained with the user groups were:
- Those responsible for public sector bodies are demanding more cooperation within the administrations in order to correctly articulate the effort in terms of transparency and open data.
- Users with a more technical profile and familiar with the data demand more data in open formats, of better quality, improved descriptions, level of disaggregation and updated in real time.
- Interested parties and users with more general profiles want possibilities to relate data from different sources, visualisations, geopositioning of available open data, map visualisations and downloadable geographical information in open formats and with the possibility of integrating them into other websites.
- In addition, open data portals need to improve their dynamization, dissemination and constant approach to data providers and users. Permanent and rapid attention is also requested to new demands for open data or the resolution of users' doubts, linking any action to the culture of openness and transparency on the part of Public Administrations.
It should also be noted that the content, dynamics and conclusions of each of the events are available on the Government of Aragon's website: https://www.aragon.es/-/los-datos-abiertos-mas-cerca-de-la-sociedad-aragon-open-data-focus.
3. What actions are you developing to respond to user requests?
The meetings have been intense, full of ideas, proposals and debates. Now it is time to record the conclusions of these meetings in order to work on the action lines and demands suggested.
It is necessary to emphasize that these meetings and their conclusions are aligned with the Strategy of Aragon Open Data in which the evolution of the web portal of Aragon Open Data and the map of agents (journalists, researchers, citizens) that work with open data to offer an integral vision of the service are analysed. That is why Aragon Open Data Focus has a place in this Strategy.
With this, we continue profiling, working and complying with its lines of action, which allow us to promote the implication of the users and develop a data governance model that covers their demands: working on the opening of new resources, improving the existing ones and favouring their use.
4. What obstacles have you encountered when setting up Aragon Open Data Focus?
The main obstacle, as I have already pointed out, has been the coronavirus pandemic. Aragón Open Data Focus had a markedly face-to-face character, to talk and debate with those involved in a direct way with participative dynamics. We even had events planned in small villages and the rural environment of Aragon, to disseminate and share ideas about open data and to know first-hand their demands and needs. The pandemic made us change its dynamics and do it online, which has not been a problem either to hold these "meetings" and obtain conclusions.
Beyond that, we have noticed that users have great expectations about open data, and sometimes it is not easy to respond to them in this type of conference for different reasons: the inexistence of data in the administration (it is the responsibility of another organisation), technical problems, or due to the characteristics of the open data available. Circumstances that, although they may justify, not excuse, in detail the situation, are difficult to understand by the user or data demander, when we are in the 21st century, in the era of data and the digital economy.
5. What are the benefits for public administrations of this type of initiative?
Above all, it allows us to go deeper into the real needs of users and groups with whom we have worked in order to better focus our actions and future lines of work.
6. A few years ago, you told us that the datasets most demanded by users of Aragon Open Data were those related to the budget. Has this situation changed? What type of information do re-users demand now?
The budget data is still one of the most used in Aragon Open Data, both as open datasets and in the service that reflects it: https://presupuesto.aragon.es/
Today, if we look at the number of accesses, currently, the most demanded (it doubles the second resource with the most accesses) is data related to the coronavirus in Aragon, followed by cartographic data, data on the CAP (Common Agricultural Policy) and statistical data.
7. How do you see the panorama of open data in Spain? What strengths do you think there are? And weaknesses? How could they be resolved?
The outlook in Spain is promising. Much has already been done to provide data in open formats by different public administrations at all territorial levels. Now, once their offer has grown in the number of datasets, the portals are been adapting to the demands of society which not only wants quantity, but also very specific data to make the most of it, for example: data on mobility, passenger transport, telecommunications infrastructures, digital services and health, in real time. This is in line with what the European Union has legislated on its new directive on open data and the re-use of public sector information as a strength. In other words, there is an important regulatory and institutional support for open data initiatives in Europe, in order to make the continent a truly data-based digital marketplace that improves the lives of citizens.
Weak points in the opening up of open data, which has good regulatory and legal backing, may be the response times for including a set of data requested from a given portal, and it would therefore be advisable to speed up the processes of opening up the data further. And in the event that there is no supplier acting as data manager, taking advantage of the possibilities of current technologies, for example, data recognition or automatic detection of schemes with quality and security validators, to allow open data to be opened and made available with the minimum of human intervention.
If data is an asset of the public administrations that serve citizens, companies and third parties in this new digital economy, they also have to lose that aura of closure and ownership, which they sometimes give off.
The Aporta Challenge, in line with many other initiatives promoted by public administrations, could not be unaware of the great challenges we are facing in this year 2020. For this reason, its third edition, while fulfilling its usual objective of promoting the use of data and related technologies, aims to contribute to solving problems related to digital education. Without doubt, this is one of the areas in which the need to propose new innovations to ensure that the pandemic does not cause serious damage to the potential of the younger generations has been most evident.
With the slogan "The value of data in digital education", datos.gob.es is proposing an Aporta Challenge that in 2020 reward ideas and prototypes that identify new opportunities for capturing, analysing and using the intelligence of data in the development of solutions for the educational sphere at any of its stages.
Identifying a problem
If we were to approach participation in the challenge as a data science project, the first thing we would do is determine the question we would like to solve, in short, choose a problem worth working on. In this article we propose some lines of work, but they are not restrictions, they are only intended to serve as inspiration to make it easier for us to choose an educational challenge with a great impact. We must always aspire to improve the world.
On the other hand, we can look at the large educational gaps defined by the Educa en Digital programme, which aims to complement the Digitalisation and Digital Skills Plan and to promote the digital transformation of education in Spain, making intensive use of ICT both in the classroom and in non-presential formats, and tackling specific problems thanks to developments linked to data and artificial intelligence. For each of the specific objectives we can think of a good number of issues on which we can work:
- The provision of digital educational devices and resources. For example, how can we help ensure that access to technology is not a barrier to access to education, especially for the most vulnerable groups? how can we reduce the requirements for accessing educational programmes remotely? how can we rely on the most economical devices that are most widely available to students? etc.
- The provision of digital educational resources, especially in relation to the previous point. On many occasions the problem we can work on does not have to be completely new, but we can find a more efficient approach to an apparently resolved issue. For example, how can we help a teacher to better monitor a large number of students? how can we improve the security of the applications used by students through public networks? how can we guarantee the privacy of students? etc.
- The adequacy of teachers' digital skills. In this line there are also a significant number of questions to be resolved: how can we improve the usability of tools for teachers and students? how can we promote skills related to collaboration or communication when people are not in the same physical space? how can we help STEM skills to be perceived as transversal? etc.
- The application of artificial intelligence to personalised education, which is almost a holy grail of Education. How can we create personalised learning paths for each group of students, or better still, optimising the learning pace of each student according to their individual characteristics? how can we predict the impact of changes in programmes on the evolution of group or student learning? how can we detect and avoid gender bias in models that work on any of the above problems?

In short, with the suggestions published in the bases and a little research, it is easy to locate a good number of issues on which we can do our bit to improve digital education. Without forgetting our own experience. We have all been at least students, and perhaps also teachers, at some point.
Examining the prior art
Before we begin our work, we must consider that it is very likely that, with or without success, others have identified and proposed solutions to the problem we have chosen. From their success or failure, we can also draw lessons so reviewing the state of the art is key to focusing our project well. In relation to educational technology it is interesting to review resources such as
- The activity of educational technology start-ups in repositories such as EU-startups or the WISE accelerator.
- Awards focused on educational technology such as the prestigious Global Learning XPRIZE or the WISE Prize for Education.
- The list of more than 2.500 educational innovation projects from around the world contained in the Leapfrogging Inequality: Remaking Education to Help Young People Thrive.
- The solutions that reuse open data in the area of education and that highlight portals such as the European data portal or datos.gob.es.
As you will see, many of the projects are focused on solving problems that are mostly present in countries less developed than ours. However, the pandemic has changed the rules of the game from what we could have foreseen and is challenging us again with problems that under normal circumstances we would consider to be overcome.
Locating datasets
Open data is present in almost every problem that is solved by data related technologies and it is usually one ingredient, not the only one. The foundations of the Aporta Challenge reflect this reality and impose very few restrictions on creators, using data sources listed in datos.gob.es is not even mandatory, despite being the driving force behind the challenge. At least one set of data generated by the public administrations must be used, but it can come from any source and can play any role within the project.
To locate data related to our project we can start with the more than 1,700 datasets of the datos.gob.es data catalogue, which federates a good part of the data available in Spanish portals. In the European Data Portal we can find more than 8,000 datasets related to education from all EU countries and another 3,000 datasets from the catalogue of the European Union open data portal.
International institutions that work for the development of education such as UNICEF or the World Bank also have open data catalogues in which we can locate resources that help us in some part of our project.
The Google dataset search engine, the AWS open data registry or the Microsoft Azure datasets are resources in which we can also find datasets to enrich any data-based project.
The data catalogue of institutions such as the US Government's Institute of Education Sciences, which although focused on the United States, undoubtedly contains data of great value for measuring and understanding the impact of initiatives developed to improve education and which can enrich many projects.
Another option that we should bear in mind is that it may not be enough to solve the problem we have chosen to clean up, reconcile and transform datasets from any of the sources that are publicly and openly available. Sometimes we need to work on generating or building our own dataset. And in that case a very good option is to make it publicly and openly available so that it can be reused and improved by others.
Defining the product
Finally, we have to think about the best way to deliver the result of our work so that it can be used by its recipients and have the impact we want. The options are multiple and again the bases do not impose restrictions. Some possibilities could be:
- Mobile Apps: The enormous penetration of the iOS and Android platforms means that any product we build for these platforms and publish in their respective stores is guaranteed to have a huge potential diffusion. In addition, there are options to carry out multiplatform developments and even to carry out developments with little (low-code) or no (non-code) software development knowledge.
- Websites: Web applications are probably still the most common mechanism for making a project of any kind available to society in general. The advances in managed services of the large cloud providers and the facilities they offer to make infrastructure available for free mean that it has never been easier to start a project. It is also possible to use non-code platforms such as appypie or low-code platforms such as Appian to reduce the initial barrier if we do not have a software developer on the team.
- Artificial Intelligence Algorithms: It is increasingly common for a data-based project to be delivered in the form of an automatic learning model or artificial intelligence. For example, Amazon AWS offers the possibility to list algorithms like Microsoft Azure in its Machine Learning Marketplace so that they can be consumed by other applications.
- Stories and Visualizations: Sometimes the best way to deliver results is through a visualization or a DataStory that allows you to communicate the result of your work. For this purpose, there are multiple options that range from the utilities that incorporate most of the generic Business Intelligence tools such as Tableau to others specialised in spatial location such as the Spanish Carto.
We wish all participants good luck and encourage you to work on a challenge that has a great impact on society.
Content prepared by Jose Luis Marín, Senior Consultant in Data, Strategy, Innovation & Digitalization.
The contents and points of view reflected in this publication are the sole responsibility of its author.

