Documentación

Data science has become a pillar of evidence-based decision-making in the public and private sectors. In this context, there is a need for a practical and universal guide that transcends technological fads and provides solid and applicable principles. This guide offers a decalogue of good practices that accompanies the data scientist throughout the entire life cycle of a project, from the conceptualization of the problem to the ethical evaluation of the impact.

  1. Understand the problem before looking at the data. The initial key is to clearly define the context, objectives, constraints, and indicators of success. A solid framing prevents later errors.
     
  2. Know the data in depth. Beyond the variables, it involves analyzing their origin, traceability and possible biases. Data auditing is essential to ensure representativeness and reliability.
     
  3. Ensure quality. Without clean data there is no science. EDA techniques, imputation, normalization and control of quality metrics allow to build solid and reproducible bases.
     
  4. Document and version. Reproducibility is a scientific condition. Notebooks, pipelines, version control, and MLOps practices ensure traceability and replicability of processes and models.
     
  5. Choose the right model. Sophistication does not always win: the decision must balance performance, interpretability, costs and operational constraints.
     
  6. Measure meaningfully. Metrics should align with goals. Cross-validation, data drift control and rigorous separation of training, validation and test data are essential to ensure generalization.
     
  7. Visualize to communicate. Visualization is not an ornament, but a language to understand and persuade. Data-driven storytelling and clear design are critical tools for connecting with diverse audiences.
     
  8. Work as a team. Data science is collaborative: it requires data engineers, domain experts, and business leaders. The data scientist must act as a facilitator and translator between the technical and the strategic.
     
  9. Stay up-to-date (and critical). The ecosystem is constantly evolving. It is necessary to combine continuous learning with selective criteria, prioritizing solid foundations over passing fads.
     
  10. Be ethical. Models have a real impact. It is essential to assess bias, protect privacy, ensure explainability and anticipate misuse. Ethics is a compass and a condition of legitimacy.

    Title: Data Scientist's Decalogue  Understand the problem before looking at the data.  Know the data in depth.  Ensure quality.  Document and version.  Choose the right model.  Measure meaningfully.  Visualize to communicate.  Work as a team.  Stay up to date (and critical).  Be ethical.  Source: report “Data Scientist's Decalogue,” datos.gob.es.

Finally, the report includes a bonus-track on Python and R, highlighting that both languages are complementary allies: Python dominates in production and deployment, while R offers statistical rigor and advanced visualization. Knowing both multiplies the versatility of the data scientist.

The Data Scientist's Decalogue is a practical, timeless and cross-cutting guide that helps professionals and organizations turn data into informed, reliable and responsible decisions. Its objective is to strengthen technical quality, collaboration and ethics in a discipline in full expansion and with great social impact.

calendar icon
Noticia

Open data is a fundamental fuel for contemporary digital innovation, creating information ecosystems that democratise access to knowledge and foster the development of advanced technological solutions.

However, the mere availability of data is not enough.  Building robust and sustainable ecosystems requires clear regulatory frameworks, sound ethical principles and management methodologies that ensure both innovation and the protection of fundamental rights. Therefore, the specialised documentation that guides these processes becomes a strategic resource for governments, organisations and companies seeking to participate responsibly in the digital economy.

In this post, we compile recent reports, produced by leading organisations in both the public and private sectors, which offer these key orientations. These documents not only analyse the current challenges of open data ecosystems, but also provide practical tools and concrete frameworks for their effective implementation.

State and evolution of the open data market

Knowing what it looks like and what changes have occurred in the open data ecosystem at European and national level is important to make informed decisions and adapt to the needs of the industry.  In this regard, the European Commission publishes, on a regular basis, a Data Markets Report, which is updated regularly. The latest version is dated December 2024, although use cases exemplifying the potential of data in Europe are regularly published (the latest in February 2025).

 On the other hand, from a European regulatory perspective, the latest annual report on the implementation of the Digital Markets Act (DMA)takes a comprehensive view of the measures adopted to ensure fairness and competitiveness in the digital sector. This document is interesting to understand how the regulatory framework that directly affects open data ecosystems is taking shape.

At the national level, the ASEDIE sectoral report on the "Data Economy in its infomediary scope" 2025 provides quantitative evidence of the economic value generated by open data ecosystems in Spain.

The importance of open data in AI

It is clear that the intersection between open data and artificial intelligence is a reality that poses complex ethical and regulatory challenges that require collaborative and multi-sectoral responses. In this context, developing frameworks to guide the responsible use of AI becomes a strategic priority, especially when these technologies draw on public and private data ecosystems to generate social and economic value. Here are some reports that address this objective:

  • Generative IA and Open Data: Guidelines and Best Practices: the U.S. Department of Commerce. The US government has published a guide with principles and best practices on how to apply generative artificial intelligence ethically and effectively in the context of open data. The document provides guidelines for optimising the quality and structure of open data in order to make it useful for these systems, including transparency and governance.
  • Good Practice Guide for the Use of Ethical Artificial Intelligence: This guide demonstrates a comprehensive approach that combines strong ethical principles with clear and enforceable regulatory precepts.. In addition to the theoretical framework, the guide serves as a practical tool for implementing AI systems responsibly, considering both the potential benefits and the associated risks. Collaboration between public and private actors ensures that recommendations are both technically feasible and socially responsible.
  • Enhancing Access to and Sharing of Data in the Age of AI: this analysis by the Organisation for Economic Co-operation and Development (OECD) addresses one of the main obstacles to the development of artificial intelligence: limited access to quality data and effective models. Through examples, it identifies specific strategies that governments can implement to significantly improve data access and sharing and certain AI models.
  • A Blueprint to Unlock New Data Commons for AI: Open Data Policy Lab has produced a practical guide that focuses on the creation and management of data commons specifically designed to enable cases of public interest artificial intelligence use. The guide offers concrete methodologies on how to manage data in a way that facilitates the creation of these data commons, including aspects of governance, technical sustainability and alignment with public interest objectives.
  • Practical guide to data-driven collaborations: the Data for Children Collaborative initiative has published a step-by-step guide to developing effective data collaborations, with a focus on social impact. It includes real-world examples, governance models and practical tools to foster sustainable partnerships.

In short, these reports define the path towards more mature, ethical and collaborative data systems. From growth figures for the Spanish infomediary sector to European regulatory frameworks to practical guidelines for responsible AI implementation, all these documents share a common vision: the future of open data depends on our ability to build bridges between the public and private sectors, between technological innovation and social responsibility.

calendar icon
Documentación

 The Spanish Data Protection Agency  has recently published the Spanish translation of the Guide on Synthetic Data Generation, originally produced by the Data Protection Authority of Singapore. This document provides technical and practical guidance for data protection officers, managers and data protection officers on how to implement this technology that allows simulating real data while maintaining their statistical characteristics without compromising personal information.

The guide highlights how synthetic data can drive the data economy, accelerate innovation and mitigate risks in security breaches. To this end, it presents case studies, recommendations and best practices aimed at reducing the risks of re-identification. In this post, we analyse the key aspects of the Guide highlighting main use cases and examples of practical application.

What are synthetic data? Concept and benefits

Synthetic data is artificial data generated using mathematical models specifically designed for artificial intelligence (AI) or machine learning (ML) systems. This data is created by training a model on a source dataset to imitate its characteristics and structure, but without exactly replicating the original records.

High-quality synthetic data retain the statistical properties and patterns of the original data. They therefore allow for analyses that produce results similar to those that would be obtained with real data. However, being artificial, they significantly reduce the risks associated with the exposure of sensitive or personal information.

For more information on this topic, you can read this Monographic report on synthetic data:. What are they and what are they used for? with detailed information on the theoretical foundations, methodologies and practical applications of this technology.

The implementation of synthetic data offers multiple advantages for organisations, for example:

  • Privacy protection: allow data analysis while maintaining the confidentiality of personal or commercially sensitive information.
  • Regulatory compliance: make it easier to follow data protection regulations while maximising the value of information assets.
  • Risk reduction: minimise the chances of data breaches and their consequences.
  • Driving innovation: accelerate the development of data-driven solutions without compromising privacy.
  • Enhanced collaboration: Enable valuable information to be shared securely across organisations and departments.

Steps to generate synthetic data

To properly implement this technology, the Guide on Synthetic Data Generation recommends following a structured five-step approach:

  1. Know the data: cClearly understand the purpose of the synthetic data and the characteristics of the source data to be preserved, setting precise targets for the threshold of acceptable risk and expected utility.
  2. Prepare the data: iidentify key insights to be retained, select relevant attributes, remove or pseudonymise direct identifiers, and standardise formats and structures in a well-documented data dictionary .
  3. Generate synthetic data: sselect the most appropriate methods according to the use case, assess quality through completeness, fidelity and usability checks, and iteratively adjust the process to achieve the desired balance.
  4. Assess re-identification risks: aApply attack-based techniques to determine the possibility of inferring information about individuals or their membership of the original set, ensuring that risk levels are acceptable.
  5. Manage residual risks: iImplement technical, governance and contractual controls to mitigate identified risks, properly documenting the entire process.

Practical applications and success stories

To realise all these benefits, synthetic data can be applied in a variety of scenarios that respond to specific organisational needs. The Guide mentions, for example:

1 Generation of datasets for training AI/ML models: lSynthetic data solves the problem of the scarcity of labelled (i.e. usable) data for training AI models. Where real data are limited, synthetic data can be a cost-effective alternative. In addition, they allow to simulate extraordinary events or to increase the representation of minority groups in training sets. An interesting application to improve the performance and representativeness of all social groups in AI models.

2 Data analysis and collaboration: eThis type of data facilitates the exchange of information for analysis, especially in sectors such as health, where the original data is particularly sensitive. In this sector as in others, they provide stakeholders with a representative sample of actual data without exposing confidential information, allowing them to assess the quality and potential of the data before formal agreements are made.

3 Software testing: sis very useful for system development and software testing because it allows the use of realistic, but not real data in development environments, thus avoiding possible personal data breaches in case of compromise of the development environment..

The practical application of synthetic data is already showing positive results in various sectors:

I. Financial sector: fraud detection. J.P. Morgan has successfully used synthetic data to train fraud detection models, creating datasets with a higher percentage of fraudulent cases that significantly improved the models' ability to identify anomalous behaviour.

II. Technology sector: research on AI bias. Mastercard collaborated with researchers to develop methods to test for bias in AI using synthetic data that maintained the true relationships of the original data, but were private enough to be shared with outside researchers, enabling advances that would not have been possible without this technology.

III. Health sector: safeguarding patient data. Johnson & Johnson implemented AI-generated synthetic data as an alternative to traditional anonymisation techniques to process healthcare data, achieving a significant improvement in the quality of analysis by effectively representing the target population while protecting patients' privacy.

The balance between utility and protection

It is important to note that synthetic data are not inherently risk-free. The similarity to the original data could, in certain circumstances, allow information about individuals or sensitive data to be leaked. It is therefore crucial to strike a balance between data utility and data protection.

This balance can be achieved by implementing good practices during the process of generating synthetic data, incorporating protective measures such as:

  • Adequate data preparation: removal of outliers, pseudonymisation of direct identifiers and generalisation of granular data.
  • Re-identification risk assessment: analysis of the possibility that synthetic data can be linked to real individuals.
  • Implementation of technical controls: adding noise to data, reducing granularity or applying differential privacy techniques.

Synthetic data represents a exceptional opportunity to drive data-driven innovation while respecting privacy and complying with data protection regulations. Their ability to generate statistically representative but artificial information makes them a versatile tool for multiple applications, from AI model training to inter-organisational collaboration and software development.

By properly implementing the best practices and controls described in Guide on synthetic data generation translated by the AEPD, organisations can reap the benefits of synthetic data while minimising the associated risks, positioning themselves at the forefront of responsible digital transformation. The adoption of privacy-enhancing technologies such as synthetic data is not only a defensive measure, but a proactive step towards an organisational culture that values both innovation and data protection, which are critical to success in the digital economy of the future.

calendar icon
Noticia

How can public administrations harness the value of data? This question is not a simple one to address; its answer is conditioned by several factors that have to do with the context of each administration, the data available to it and the specific objectives set.

However, there are reference guides that can help define a path to action. One of them is published by the European Commission through the EU Publications Office, Data Innovation Toolkit, which emerges as a strategic compass to navigate this complex data innovation ecosystem.

This tool is not a simple manual as it includes templates to make the implementation of the process easier. Aimed at a variety of profiles, from novice analysts to experienced policy makers and technology innovators, Data Innovation Toolkit is a useful resource that accompanies you through the process, step by step.

It aims to democratise data-driven innovation by providing a structured framework that goes beyond the mere collection of information. In this post, we will analyse the contents of the European guide, as well as the references it provides for good innovative use of data.

Structure covering the data lifecycle

The guide is organised in four main steps, which address the entire data lifecycle.

  1. Planning

The first part of the guide focuses on establishing a strong foundation for any data-driven innovation project. Before embarking on any process, it is important to define objectives. To do so, the Data Innovation Toolkit suggests a deep reflection that requires aligning the specific needs of the project with the strategic objectives of the organisation. In this step, stakeholder mapping is also key. This implies a thorough understanding of the interests, expectations and possible contributions of each actor involved. This understanding enables the design of engagement strategies that maximise collaboration and minimise potential conflicts.

To create a proper data innovation team, we can use the RACI matrix (Responsible, Accountable, Consulted, Informed) to define precise roles and responsibilities. It is not just about bringing professionals together, but about building multidisciplinary teams where each member understands their exact role and contribution to the project. To assist in this task the guide provides:

  • Challenge definition tool: to identify and articulate the key issues they seek to address, summarising them in a single statement.
  • Stakeholder mapping tool: to visualise the network of individuals and organisations involved, assessing their influence and interests.
  • Team definition tool: to make it easier to identify people in your organisation who can help you.
  • Tool to define roles: to, once the necessary profiles have been defined, determine their responsibilities and role in the data project in more detail, using a RACI matrix.
  • Tool to define People:  People is a concept used to define specific types of users, called behavioural archetypes. This guide helps to create these detailed profiles, which represent the users or clients who will be involved in the project.
  • Tool for mapping Data Journey: to make a synthetic representation describing step by step how a user can interact with his data. The process is represented from the user's perspective, describing what happens at each stage of the interaction and the touch points.
  1. Collection and processing

Once the team has been set up and the objectives have been identified, a classification of the data is made that goes beyond the traditional division between quantitative and qualitative data.

Quantitative scope:

  • Discrete data, such as the number of complaints in a public service, represents not only a number, but an opportunity to systematically identify areas for improvement. They allow administrations to map recurrent problems and design targeted interventions. Ongoing data, such as response times for administrative procedures, provide a snapshot of operational efficiency. It is not just a matter of measuring, but of understanding the factors that influence the variability of these times and designing more agile and efficient processes.

Qualitative:

  • Nominal (name) data enables the categorisation of public services, allowing for a more structured understanding of the diversity of administrative interventions.

  • Ordinal (number) data, such as satisfaction ratings, become a prioritisation tool for continuous improvement.

A series of checklists are available in the document to review this aspect:

  • Checklist of data gaps: to identify if there are any gaps in the data to be used and, if so, how to fill them.
  • Template for data collection: to align the dataset to the objective of the innovative analysis.
  • Checklist of data collection: to ensure access to the data sources needed to run the project.
  • Checklist of data quality: to review the quality level of the dataset.
  • Data processing letters: to check that data is being processed securely, efficiently and in compliance with regulations.
  1. Sharing and analysis

At this point, the Data Innovation Toolkit proposes four analysis strategies that transform data into actionable knowledge.

  1. Descriptive analysis: goes beyond the simple visualisation of historical data, allowing the construction of narratives that explain the evolution of the phenomena studied.
  2. Diagnostic analysis: delves deeper into the investigation of causes, unravelling the hidden patterns that explain the observed behaviours.
  3. Predictive analytics: becomes a strategic planning tool, allowing administrations to prepare for future scenarios.
  4. Prescriptive analysis: goes a step further, not only projecting trends, but recommending concrete actions based on data modelling.

In addition to analysis, the ethical dimension is fundamental. The guide therefore sets out strict protocols to ensure secure data transfers, regulatory compliance, transparency and informed consent. In this section, the following checklistis provided:

  • Data sharing template: to ensure secure, legal and transparent sharing.
  • Checklist for data sharing: to perform all the necessary steps to share data securely, ethically and achieving all the defined objectives.
  • Data analysis template: to conduct a proper analysis to obtain insights useful and meaningful for the project.
  1. Use and evaluation

The last stage focuses on converting the insights into real actions. The communication of results, the definition of key performance indicators (KPIs), impact measurement and scalability strategies become tools for continuous improvement.

A collaborative resource in continuous improvement

In short, the toolkit offers a comprehensive transformation: from evidence-based decision making to personalising public services, increasing transparency and optimising resources. You can also check the checklist available in this section which are:

  • Checklist for data use: to review that the data and the conclusions drawn are used in an effective, accountable and goal-oriented manner.
  • Data innovation through KPI tool: to define the KPIs that will measure the success of the process.
  • Impact measurement and success evaluation tools: to assess the success and impact of the innovation in the data project.
  • Data innovation scalability plan: to identify strategies to scale the project effectively.

In addition, this repository of innovation resources and data is a dynamic catalogue of knowledge including expertise articles, implementation guides, case studies and learning materials.

You can access here the list of materials provided by the Data Innovation Toolkit.

You can even contact the development team if you have any questions or would like to contribute to the repository:

To conclude, harnessing the value of data with an innovative perspective is not a magic leap, but a gradual and complex process. On this path, the Data Innovation Toolkit can be useful as it offers a structured framework. Effective implementation will require investment in training, cultural adaptation and long-term commitment.

calendar icon
Blog

We are living in a historic moment in which data is a key asset, on which many small and large decisions of companies, public bodies, social entities and citizens depend every day. It is therefore important to know where each piece of information comes from, to ensure that the issues that affect our lives are based on accurate information.

What is data subpoena?

When we talk about "citing" we refer to the process of indicating which external sources have been used to create content. This is a highly commendable issue that affects all data, including public data as enshrined in our legal system. In the case of data provided by administrations, Royal Decree 1495/2011 includes the need for the reuser to cite the source of origin of the information.

To assist users in this task, the Publications Office of the European Union published Data Citation: A guide to best practice, which discusses the importance of data citation and provides recommendations for good practice, as well as the challenges to be overcome in order to cite datasets correctly.

Why is data citation important?

The guide mentions the most relevant reasons why it is advisable to carry out this practice:

  • Credit. Creating datasets takes work. Citing the author(s) allows them to receive feedback and to know that their work is useful, which encourages them to continue working on new datasets.
  • Transparency. When data is cited, the reader can refer to it to review it, better understand its scope and assess its appropriateness.
  • Integrity. Users should not engage in plagiarism. They should not take credit for the creation of datasets that are not their own.
  • Reproducibility. Citing the data allows a third party to attempt to reproduce the same results, using the same information.
  • Re-use. Data citation makes it easier for more and more datasets to be made available and thus to increase their use.
  • Text mining. Data is not only consumed by humans, it can also be consumed by machines. Proper citation will help machines better understand the context of datasets, amplifying the benefits of their reuse.

General good practice

Of all the general good practices included in the guide, some of the most relevant are highlighted below:

  • Be precise. It is necessary that the data cited are precisely defined. The data citation should indicate which specific data have been used from each dataset. It is also important to note whether they have been processed and whether they come directly from the originator or from an aggregator (such as an observatory that has taken data from various sources).  
  • It uses "persistent identifiers" (PIDs). Just as every book in a library has an identifier, so too can (and should) have an identifier. Persistent identifiers are formal schemes that provide a common nomenclature, which uniquely identify data sets, avoiding ambiguities. When citing datasets, it is necessary to locate them and write them as an actionable hyperlink, which can be clicked on to access the cited dataset and its metadata.  There are different families of PIDs, but the guide highlights two of the most common: the Handle system and the Digital Object Identifier (DOI).
  • Indicates the time at which the data was accessed. This issue is of great importance when working with dynamic data (which are updated and changed periodically) or continuous data (on which additional data are added without modifying the old data). In such cases, it is important to cite the date of access. In addition, if necessary, the user can add "snapshots" of the dataset, i.e. copies taken at specific points in time.
  • Consult the metadata of the dataset used and the functionalities of the portal in which it is located. Much of the information necessary for the citation is contained in the metadata.
    In addition, data portals can include tools to assist with citation. This is the case of  data.europa.ue, where you can find the citation button in the top menu.

  • Rely on software tools. Most of the software used to create documents allows for the automatic creation and formatting of citations, ensuring their formatting. In addition, there are specific citation management tools such as BibTeX or Mendeley, which allow the creation of citation databases taking into account their peculiarities, a very useful function when it is necessary to cite numerous datasets in multiple documents

 

With regard to the order of all this information, there are different guidelines for the general structure of citations. The guide shows the most appropriate forms of citation according to the type of document in which the citation appears (journalistic documents, online, etc.), including examples and recommendations. One example is the Interinstitutional Style Guide (ISG), which is published by the EU Publications Office. This style guide does not contain specific guidance on how to cite data, but it does contain a general citation structure that can be applied to datasets, shown in the image below.

How to cite correctly

The second part of the report contains the technical reference material for creating citations that meet the above recommendations. It covers the elements that a citation should include and how to arrange them for different purposes.

Elements that should be included in a citation include:

  • Author, can refer to either the individual who created the dataset (personal author) or the responsible organisation (corporate author).
  • Title of the dataset.
  • Version/edition.
  • Publisher, which is the entity that makes the dataset available and may or may not coincide with the author (in case of coincidence it is not necessary to repeat it).
  • Date of publication, indicating the year in which it was created. It is important to include the time of the last update in brackets.
  • Date of citation, which expresses the date on which the creator of the citation accessed the data, including the time if necessary. For date and time formats, the guide recommends using the DCAT specification , as it offers greater accuracy in terms of interoperability.
  • Persistent identifier.

The guide ends with a series of annexes containing checklists, diagrams and examples.

If you want to know more about this document, we recommend you to watch this webinar where the most important points are summarised.

Ultimately, correctly citing datasets improves the quality and transparency of the data re-use process, while at the same time stimulating it. Encouraging the correct citation of data is therefore not only recommended, but increasingly necessary.

 

calendar icon
Blog

The Open Government Guide for Public Employees is a manual to guide the staff of public administrations at all levels (local, regional and state) on the concept and conditions necessary to achieve an "inclusive open government in a digital environment". Specifically, the document seeks for the administration to assume open government as a cross-cutting element of society, fostering its connection with the Sustainable Development Goals. 

 It is a comprehensive, practical and well-structured guide that facilitates the understanding and implementation of the principles of open government, providing examples and best practices that foster the development of the necessary skills to facilitate the long-term sustainability of open government.

What is open government?

The guide adopts the most widely accepted definition of open government, based on three axes: 

  • Transparency and access to information (vision axis): Refers to open access to public information to facilitate greater accountability.
  • Citizen participation (voice axis): It offers the possibility for citizens to be heard and intervene to improve decision-making and co-creation processes in public policies.
  • Collaboration (value axis): Focuses on cooperation within the administration or externally, with citizens or civil society organizations, through innovation to generate greater co-production in the design and implementation of public services.

This manual defines these axes and breaks them down into their most relevant elements for better understanding and application. According to the guide, the basic elements of open administration are:

  • An integrity that cuts across all public action.
  • Data are "the raw material of governments and public administrations" and, for this reason, must be made available to "any actor", respecting the limits established by law.  The use of information and communication technologies (digital) is conceived as a "space for the expansion of public action", without neglecting the digital divide.
  • The citizenry is placed at the center of open administration, because it is not only the object of public action, but also "must enjoy a leading role in all the dynamics of transparency, participation and collaboration".
  • Sustainability of government initiatives.

Adapted from a visual of the Open Government Guide for Public Employees. Source: https://funcionpublica.hacienda.gob.es/Secretaria-de-Estado-de-Funcion-Publica/Actualidad/ultimas-noticias/Noticias/2023/04/2023_04_11.html

Benefits of Open Government

With all this, a number of benefits are achieved:

  • Increased institutional quality and legitimacy

  • Increased trust in institutions

  • More targeted policies to serve citizens

  • More equitable access to policy formulation

How can I use the guide?

The guide is very useful because, in order to explain some concepts, it poses challenges so that civil servants themselves can reflect on them and even put them into practice. The authors also propose cases that provide an overview of open government in the world and its evolution, both in terms of the concepts related to it and the laws, regulations, relevant plans and areas of application (including Law 19/2023 on transparency, the Digital Spain 2025 agenda, the Digital Rights Charter and the General Data Protection Regulation, known as RGPD). As an example, the cases he mentions include the Elkar-EKIN Social Inclusion Plan of the Provincial Council of Gipuzkoa and Frena La Curva, an initiative launched by members of the Directorate General of Citizen Participation and the LAAAB of the Government of Aragon during COVID-19.

The guide also includes a self-diagnostic test on accountability, fostering collaboration, bibliographical references and proposals for improvement.

 In addition, it offers diagrams and summaries to explain and schematize each concept, as well as specific guidelines to put them into practice. For example, it includes the question "Where are the limits on access to public information? To answer this question, the guide cites the cases in which access can be given to information that refers to a person's ideology, beliefs, religious or union affiliation (p. 26). With adaptation to specific contexts, the manual could very well serve as a basis for organizing training workshops for civil servants because of the number of relevant issues it addresses and its organization.

The authors are right to also include warnings and constructive criticisms of the situation of open government in institutions. Although they do not point out directly, they talk about:

  • Black boxes: they are criticized for being closed systems. It is stated that black boxes should be opened and made transparent and that "the representation of sectors traditionally excluded from public decisions should be increased".
  • Administrative language: This is a challenge for real transparency, since, according to a study mentioned in the guide, out of 760 official texts, 78% of them were not clear. Among the most difficult to understand are applications for scholarships, grants and subsidies, and employment-related procedures.
  • The existence of a lack of transparency in some municipalities, according to another study mentioned in the guide. The global open government index, elaborated by the World Justice Project, places Spain in 24th place, behind countries such as Estonia (14th), Chile (18th), Costa Rica (19th) or Uruguay (21st) and ahead of Italy (28th), Greece (36th) or Romania (51st), among 102 countries. Open Knowledge Foundation has stopped updating its Global Open Data Index, specifically on open data.

In short, public administration is conceived as a step towards an open state, with the incorporation of the values of openness in all branches of government, including the legislative and judicial branches, in addition to government.

Additional issues to consider

For those who want to follow the path to open government, there are a number of issues to consider: 

  • The guide can be adapted to different spheres and scales of public. But public administration is not homogeneous, nor do the people in it have the same responsibilities, motivations, knowledge or attitudes to open government. A review of citizen use of open data in the Basque administration concluded that one obstacle to transparency is the lack of acceptance or collaboration in some sectors of the administration itself. A step forward, therefore, could be to conduct internal campaigns to disseminate the advantages for the administration of integrating citizen perspectives and to generate those spaces to integrate their contributions.

  • Although the black box model is disappearing from the public administration, which is subject to great scrutiny, it has returned in the form of closed and opaque algorithmic systems applied to public administration. There are many studies in the scientific literature -for example, this one- that warn that erroneous opaque box systems may be operating in public administration without anyone noticing until harmful results are generated. This is an issue that needs to be reviewed.
  •  In order to adapt it to specific contexts, it should be possible to define more concretely what participation, collaboration and co-creation are. As the guide indicates, they imply not only transparency, but also the implementation of collaborative or innovative initiatives. But it is also necessary to ask a series of additional questions: what is a collaborative or innovation initiative, what methodologies exist, how is it organized and how is its success measured?
  • The guide highlights the need to include citizens in open government. When talking about inclusion and participation, organized civil society and academia are mentioned above all, for example, in the Open Government Forum. But there is room for improvement to encourage individual participation and collaboration, especially for people with little access to technology. The guide mentions gender, territorial, age and disability digital divides, but does not explore them. However, when access to many public services, aid and assistance has been platformized (especially after the COVID-19 pandemic), such digital divides affect many people, especially the elderly, low-income and women. Since a generalist guide cannot address all relevant issues in detail, this would merit a separate guide.

Public institutions are increasingly turning to algorithmic decision-making for effective, fast and inclusive decision making. Therefore, it is also increasingly relevant to train the administration itself in open government in a digitized, digitized and platformized environment. This guide is a great first step for those who want to approach the subject.


Content prepared by Miren Gutiérrez, PhD and researcher at the University of Deusto, expert in data activism, data justice, data literacy and gender disinformation. The contents and views reflected in this publication are the sole responsibility of the author.

calendar icon
Noticia

When launching an open data initiative, it is necessary that everyone involved in its development is aware of the benefits of open data, and that they are clear about the processes and workflows needed to achieve the goals. This is the only way to achieve an initiative with up-to-date data that meets the necessary quality parameters.

This idea was clear to the Alba Smart Initiative, which is why they have created various materials that not only serve to provide knowledge to all those involved, but also to motivate and raise awareness among heads of service and councillors about the need (and obligation) to publish as much information as possible for the use of citizens.

What is Alba Smart?

The Alba Smart project is jointly developed by the city councils of Almendralejo and Badajoz with the aim of advancing in their development as smart cities. Among the areas covered are the control of tourist mobility flows, the creation of an innovation hub, the installation of wifi access points in public buildings, the implementation of social wifi and the management of car parks and fleets.

Within the framework of this project, a platform has been developed to unify the information of devices and systems, thus facilitating the management of public services in a more efficient way. Alba Smart also incorporates a balanced scorecard, as well as an open data portal for each city council (Almendralejo's and Badajoz's are available here).

The Alba Smart initiative has the collaboration of Red.es through the National Plan for Smart Cities.

Activities to promote open data

Within the context of Alba Smart, there is an Open Data working group of the Badajoz City Council, which has launched several activities focused on the dissemination of open data in the framework of a local entity.

One of the first actions they have carried out is the creation of an internal WIKI in which they have been documenting all the materials that have been useful to them. This WIKI facilitates the sharing of internal content, so that all users have at their disposal materials of interest to answer questions such as: what is open data, what roles, tasks and processes are involved, why is it necessary to adopt this type of policies, etc.

In addition, on the public part of the Badajoz website, both Transparency and Open Data have shared a series of documents included in this WIKI that may also be of interest to other local initiatives:

Contents related to Transparency

The website includes a summary section with content on TRANSPARENCY. Among other issues, it includes a list of ITA2017 indicators and their assignment to each City Council Service.

This section also includes the regulatory framework that applies to the territory, as well as external references.

Content related to open data

It also includes a summary section on OPEN DATA, which includes the regulations to be applied and links of interest, in addition to:

  • The list of 40 datasets recommended by the FEMP in 2019. This document includes a series of datasets that should be published by all local authorities in Spain in order to standardise the publication of open data and facilitate its management. The list generated by Alba Smart includes information on the local council service responsible for opening each dataset.
  • A summary of the implementation plan followed by the council, which includes the workflow to be followed, emphasising the need for it to be carried out continuously: identifying new sources of information, reviewing internal processes, etc.
  • A series of training videos, produced in-house, to assist colleagues in the preparation and publication of data. They include tutorials on how to organise data, how to catalogue data, and the review and approval process, among others.
  • The list of vocabularies they use as a reference, which allow to systematically organise, categorise and tag information.

Among its next steps is the presentation of the datasets in the map viewer of the municipality. Data from the portal is already being fed into the corporate GIS to facilitate this function in the future.

All these actions are intended to ensure the sustainability of the portal, facilitating its continuous updating by a team with clear working procedures.

calendar icon
Noticia

The COTEC Foundation has recently published a Guide for the opening and sharing of data in the business environment, with recommendations and good practices to boost data reuse in the private sector. The objective is twofold: on the one hand, to raise awareness of the opportunities of data opening and sharing within the companies, and on the other, to accompany organizations in this process.

The guide is organized around 5 steps to follow when launching a data sharing / opening initiative:

  1. Experimentation and awareness, where the potential of company data is identified.
  2. Definition of a strong strategy that includes the objectives and decisions to be taken at the organizational level among other aspects.
  3. Technical preparation, where aspects related to data management and technological infrastructure are addressed.
  4. Implementation of data life cycle, which is subdivided into the following phases: collect, prepare, share or publish and maintain.
  5. Initiative monitoring, where the level of success is analysed and actions for continuous improvement are proposed.

The guide also analyses other relevant aspects to consider, such as relationship and financing models, the legal framework and the roles or skills necessary to launch such an initiative. The document ends with an glossary of terms and acronyms.

To illustrate the numerous benefits that data opening can bring to private companies, the guide includes numerous examples of companies that have launched initiatives to share their data with third parties or have found a new business niche based on data. These examples show the potential offered by these processes, both for companies that provide data and for reusers.

The Cotec Open Data Working Group was launched in September 2018 and has been coordinated by Cristina Oyón and Amaia Martínez from the Department of Strategic Initiatives of SPRI, the Basque Agency for Business Development. The Economic Development Agency of La Rioja (ADER), Alliance 4 Universities, Banco Santander, Bankia, BBVA, Clarke Modet, the Ministry of Employment, Business and Commerce of the Junta de Andalucía, Ecoemblajes España (Ecoembes), EDP Spain, Vodafone Foundation, F. Iniciativas, Indra, the Valencian Institute for Business Competitiveness (IVACE), Orange, Primafrío, Suez Advanced Solutions Spain, Tecnalia and Telefónica are part of this group.

calendar icon
Noticia

On Friday July 7, the Federation of Municipalities and Provinces of Spain (FEMP) presented the "Strategic Guide to Open Data - Minimum Data Sets to be published" which condenses actions, guidelines and recommendations for municipalities and local authorities to publish their data in a useful and effective fashion for its use and reuse by citizens, organizations, companies or other administrations.

The document was prepared by the Open Data Group of the Network of Local Entities for Transparency and Citizen Participation of the Federation, made up of technical managers from various municipalities and other entities, and offers a work itinerary for the opening of data and its reuse by local administrations.

Specifically, this Guide details the model map of an open data portal (basic components, conditions of use or terms of reuse), the legal framework for the reuse of public sector information, data recommendations), as well as the technological plan (tools, platforms and formats), along with measurement systems (indicators for measuring open data initiatives, indexes, popularity and utility of data, etc.).

It also incorporates a Training and Dissemination Plan that includes training for technical personnel who work with data, training for citizens, and also establishes basic skills and training for citizens.

The work team that developed this document decided to define concepts and to differentiate between transparency and open data, between channel and tool and between data and information. In the section intended to show the typical map of an open data portal, it details the basic components that a portal must have and a series of recommendations are added. It also establishes the set of minimum data to be published by each municipality. Specifically, it details a series of essential components that a portal must have (catalogue, search engine, filtering, conditions of use, API services, etc.), as well as a series of recommended components (SPARQL service). This classification of mandatory / basic and recommended was made according to the results of a survey conducted by a group of re-users and managers of open data portals in Spain.

The Guide - available on paper and in digital version - defines a technology plan to be able to integrate the technology within a strategy of opening data, and it ranges from the tools needed for data cleaning, extraction and transformation and visualizers, to platforms, reusable formats, standards, data interoperability and security.

Although this Open Data Guide is specifically aimed at municipalities, it can also become a useful consultation and user manual for other public administrations, institutions, agencies, companies, citizens, entrepreneurs, university students and all other stakeholders who need information about data and its reuse.

In sum it is a document which aims to familiarize citizens and society with the world of Open Data and offers a set of tools to develop open data projects in an easy, simple way.

calendar icon
Noticia

In order to help the different agents of the national open data community, Iniciativa Aporta elaborates and publishes regularly handbooks and methodological guidelines that address different aspects of the openness and re-use of public sector information. In addition, it periodically reviews and updates the information of such materials to suit the latest industry trends while the content is redesigned to make it more attractive to users.

Thus, it has recently released the new version of the Methodological Guide for Sectoral Open Data Plans, a document created in collaboration with the international expert on open data, Carlos Iglesias, showing how to articulate a open data project around the needs of a specific sector or theme.

This latest version is enriched with visual elements -graphics, illustrations, diagrams, among others - that facilitate the use and dissemination of content. In addition, new references have been included to enable the reader to delve into the most important aspects for the preliminary study and construction of a conceptual model on which a sectoral open data initiative is based.  Finally, the content has been reorganized to improve the reading and understanding of the guide for the user to take full advantage of the information and data provided.

These actions are included in the lines of action of Iniciativa Aporta, designed to provide support to the stakeholders of the open data ecosystem in Spain, and help them create new business models based on the re-use of open data which collaborate in the socio-economic growth.

 

 

 

 

calendar icon