Educación

Summer 2025 courses to learn data science and artificial intelligences

Evento

Data science is all the rage. Professions related to this field are among the most in-demand, according to the latest study ‘Posiciones y competencias más Demandadas 2024’, carried out by the Spanish Association of Human Resources Managers. In particular, there is a significant demand for roles related to data management and analysis, such as Data Analyst, Data Engineer and Data Scientist. The rise of artificial intelligence (AI) and the need to make data-driven decisions are driving the integration of this type of professionals in all sectors.

Universities are aware of this situation and therefore offer a large number of degrees, postgraduate courses and also summer courses, both for beginners and for those who want to broaden their knowledge and explore new technological trends. Here are just a few examples of some of them. These courses combine theory and practice, allowing you to discover the potential of data.

1. Data Analysis and Visualisation: Practical Statistics with R and Artificial Intelligence. National University of Distance Education (UNED).

This seminar offers comprehensive training in data analysis with a practical approach. Students will learn to use the R language and the RStudio environment, with a focus on visualisation, statistical inference and its use in artificial intelligence systems. It is aimed at students from related fields and professionals from various sectors (such as education, business, health, engineering or social sciences) who need to apply statistical and AI techniques, as well as researchers and academics who need to process and visualise data.

Date and place: from 25 to 27 June 2025 in online and face-to-face mode (in Plasencia).

2. Big Data. Data analysis and automatic learning with Python. Complutense University.

Thanks to this training, students will be able to acquire a deep understanding of how data is obtained, managed and analysed to generate valuable knowledge for decision making. Among other issues, the life cycle of a Big Data project will be shown, including a specific module on open data. In this case, the language chosen for the training will be Python. No previous knowledge is required to attend: it is open to university students, teachers, researchers and professionals from any sector with an interest in the subject.

Date and place: 30 June to 18 July 2025 in Madrid.

3. Challenges in Data Science: Big Data, Biostatistics, Artificial Intelligence and Communications. University of Valencia.

This programme is designed to help participants understand the scope of the data-driven revolution. Integrated within the Erasmus mobility programmes, it combines lectures, group work and an experimental lab session, all in English. Among other topics, open data, open source tools, Big Data databases, cloud computing, privacy and security of institutional data, text mining and visualisation will be discussed.

Date and place: From 30 June to 4 July at two venues in Valencia. Note: Places are currently full, but the waiting list is open.

4. Digital twins: from simulation to intelligent reality. University of Castilla-La Mancha.

Digital twins are a fundamental tool for driving data-driven decision-making. With this course, students will be able to understand the applications and challenges of this technology in various industrial and technological sectors. Artificial intelligence applied to digital twins, high performance computing (HPC) and digital model validation and verification, among others, will be discussed. It is aimed at professionals, researchers, academics and students interested in the subject.

Date and place: 3 and 4 July in Albacete.

5. Health Geography and Geographic Information Systems: practical applications. University of Zaragoza.

The differential aspect of this course is that it is designed for those students who are looking for a practical approach to data science in a specific sector such as health. It aims to provide theoretical and practical knowledge about the relationship between geography and health. Students will learn how to use Geographic Information Systems (GIS) to analyse and represent disease prevalence data. It is open to different audiences (from students or people working in public institutions and health centres, to neighbourhood associations or non-profit organisations linked to health issues) and does not require a university degree.

Date and place: 7-9 July 2025 in Zaragoza.

6. Deep into data science. University of Cantabria.

Aimed at scientists, university students (from second year onwards) in engineering, mathematics, physics and computer science, this intensive course aims to provide a complete and practical vision of the current digital revolution. Students will learn about Python programming tools, machine learning, artificial intelligence, neural networks or cloud computing, among other topics. All topics are introduced theoretically and then experimented with in laboratory practice.

Date and place: from 7 to 11 July 2025 in Camargo.

7. Advanced Programming. Autonomous University of Barcelona.

Taught entirely in English, the aim of this course is to improve students' programming skills and knowledge through practice. To do so, two games will be developed in two different languages, Java and Python. Students will be able to structure an application and program complex algorithms. It is aimed at students of any degree (mathematics, physics, engineering, chemistry, etc.) who have already started programming and want to improve their knowledge and skills.

Date and place: 14 July to 1 August 2025, at a location to be defined.

8. Data visualisation and analysis with R. Universidade de Santiago de Compostela.

This course is aimed at beginners in the subject. It will cover the basic functionalities of R with the aim that students acquire the necessary skills to develop descriptive and inferential statistical analysis (estimation, contrasts and predictions). Search and help tools will also be introduced so that students can learn how to use them independently.

Date and place: from 14 to 24 July 2025 in Santiago de Compostela.

9. Fundamentals of artificial intelligence: generative models and advanced applications. International University of Andalusia.

This course offers a practical introduction to artificial intelligence and its main applications. It covers concepts related to machine learning, neural networks, natural language processing, generative AI and intelligent agents. The language used will be Python, and although the course is introductory, it will be best used if the student has a basic knowledge of programming. It is therefore aimed primarily at undergraduate and postgraduate students in technical areas such as engineering, computer science or mathematics, professionals seeking to acquire AI skills to apply in their industries, and teachers and researchers interested in updating their knowledge of the state of the art in AI.

Date and place: 19-22 August 2025, in Baeza.

10. IA Generative AI to innovate in the company: real cases and tools for its implementation. University of the Basque Country.

This course, open to the general public, aims to help understand the impact of generative AI in different sectors and its role in digital transformation through the exploration of real cases of application in companies and technology centres in the Basque Country. This will combine talks, panel discussions and a practical session focused on the use of generative models and techniques such as Retrieval-Augmented Generation (RAG) and Fine-Tuning.

Date and place: 10 September in San Sebastian.

Investing in technology training during the summer is not only an excellent way to strengthen skills, but also to connect with experts, share ideas and discover opportunities for innovation. This selection is just a small sample of what's on offer. If you know of any other courses you would like to share with us, please leave a comment or write to dinamizacion@datos.gob.es

04/06/2025

Podcast: Data and open knowledge in Spanish institutions

Entrevista

Open knowledge is knowledge that can be reused, shared and improved by other users and researchers without noticeable restrictions. This includes data, academic publications, software and other available resources. To explore this topic in more depth, we have representatives from two institutions whose aim is to promote scientific production and make it available in open access for reuse:

Mireia Alcalá Ponce de León, Information Resources Technician of the Learning, Research and Open Science Area of the Consortium of University Services of Catalonia (CSUC).
Juan Corrales Corrillero, Manager of the data repository of the Madroño Consortium.

Listen to the full podcast (only available in Spanish)

Summary / Transcript of the interview

1. Can you briefly explain what the institutions you work for do?

Mireia Alcalá: The CSUC is the Consortium of University Services of Catalonia and is an organisation that aims to help universities and research centres located in Catalonia to improve their efficiency through collaborative projects. We are talking about some 12 universities and almost 50 research centres.

We offer services in many areas: scientific computing, e-government, repositories, cloud administration, etc. and we also offer library and open science services, which is what we are closest to. In the area of learning, research and open science, which is where I am working, what we do is try to facilitate the adoption of new methodologies by the university and research system, especially in open science, and we give support to data management research.

Juan Corrales: The Consorcio Madroño is a consortium of university libraries of the Community of Madrid and the UNED (National University of Distance Education) for library cooperation.. We seek to increase the scientific output of the universities that are part of the consortium and also to increase collaboration between the libraries in other areas. We are also, like CSUC, very involved in open science: in promoting open science, in providing infrastructures that facilitate it, not only for the members of the Madroño Consortium, but also globally. Apart from that, we also provide other library services and create structures for them.

2. What are the requirements for an investigation to be considered open?

Juan Corrales: For research to be considered open there are many definitions, but perhaps one of the most important is given by the National Open Science Strategy, which has six pillars.

One of them is that it is necessary to put in open access both research data and publications, protocols, methodologies.... In other words, everything must be accessible and, in principle, without barriers for everyone, not only for scientists, not only for universities that can pay for access to these research data or publications. It is also important to use open source platforms that we can customise. Open source is software that anyone, in principle with knowledge, can modify, customise and redistribute, in contrast to the proprietary software of many companies, which does not allow all these things. Another important point, although this is still far from being achieved in most institutions, is allowing open peer review, because it allows us to know who has done a review, with what comments, etc. It can be said that it allows the peer review cycle to be redone and improved. A final point is citizen science: allowing ordinary citizens to be part of science, not only within universities or research institutes.
And another important point is adding new ways of measuring the quality of science.

Mireia Alcalá: I agree with what Juan says. I would also like to add that, for an investigation process to be considered open, we have to look at it globally. That is, include the entire data lifecycle. We cannot talk about a science being open if we only look at whether the data at the end is open. Already at the beginning of the whole data lifecycle, it is important to use platforms and work in a more open and collaborative way.

3. Why is it important for universities and research centres to make their studies and data available to the public?

Mireia Alcalá: I think it is key that universities and centres share their studies, because a large part of research, both here in Spain and at European and world level, is funded with public money. Therefore, if society is paying for the research, it is only logical that it should also benefit from its results. In addition, opening up the research process can help make it more transparent, more accountable, etc. Much of the research done to date has been found to be neither reusable nor reproducible. What does this mean? That the studies that have been done, almost 80% of the time someone else can't take it and reuse that data. Why? Because they don't follow the same standards, the same mannersand so on. So, I think we have to make it extensive everywhere and a clear example is in times of pandemics. With COVID-19, researchers from all over the world worked together, sharing data and findings in real time, working in the same way, and science was seen to be much faster and more efficient.

Juan Corrales: The key points have already been touched upon by Mireia. Besides, it could be added that bringing science closer to society can make all citizens feel that science is something that belongs to us, not just to scientists or academics. It is something we can participate in and this can also help to perhaps stop hoaxes, fake news, to have a more exhaustive vision of the news that reaches us through social networks and to be able to filter out what may be real and what may be false.

4.What research should be published openly?

Juan Corrales: Right now, according to the law we have in Spain, the latest Law of science, all publications that are mainly financed by public funds or in which public institutions participatemust be published in open access. This has not really had much repercussion until last year, because, although the law came out two years ago, the previous law also said so, there is also a law of the Community of Madrid that says the same thing.... but since last year it is being taken into account in the evaluation that the ANECA (the Quality Evaluation Agency) does on researchers.. Since then, almost all researchers have made it a priority to publish their data and research openly. Above all, data was something that had not been done until now.

Mireia Alcalá: At the state level it is as Juan says. We at the regional level also have a law from 2022, the Law of science, which basically says exactly the same as the Spanish law. But I also like people to know that we have to take into account not only the state legislation, but also the calls for proposals from where the money to fund the projects comes from. Basically in Europe, in framework programmes such as Horizon Europe, it is clearly stated that, if you receive funding from the European Commission, you will have to make a data management plan at the beginning of your research and publish the data following the FAIR principles.

5. Among other issues, both CSUC and Consorcio Madroño are in charge of supporting entities and researchers who want to make their data available to the public. How should a process of opening research data be? What are the most common challenges and how do you solve them?

Mireia Alcalá: In our repository, which is called RDR (from Repositori de Dades de Recerca), it is basically the participating institutions that are in charge of supporting the research staff.. The researcher arrives at the repository when he/she is already in the final phase of the research and needs to publish the data yesterday, and then everything is much more complex and time consuming. It takes longer to verify this data and make it findable, accessible, interoperable and reusable.

In our particular case, we have a checklist that we require every dataset to comply with to ensure this minimum data quality, so that it can be reused. We are talking about having persistent identifiers such as ORCID for the researcher or ROR to identify the institutions, having documentation explaining how to reuse that data, having a licence, and so on. Because we have this checklist, researchers, as they deposit, improve their processes and start to work and improve the quality of the data from the beginning. It is a slow process. The main challenge, I think, is that the researcher assumes that what he has is data, because most of them don't know it. Most researchers think of data as numbers from a machine that measures air quality, and are unaware that data can be a photograph, a film from an archaeological excavation, a sound captured in a certain atmosphere, and so on. Therefore, the main challenge is for everyone to understand what data is and that their data can be valuable to others.

And how do we solve it? Trying to do a lot of training, a lot of awareness raising. In recent years, the Consortium has worked to train data curation staff, who are dedicated to helping researchers directly refine this data. We are also starting to raise awareness directly with researchers so that they use the tools and understand this new paradigm of data management.

Juan Corrales: In the Madroño Consortium, until November, the only way to open data was for researchers to pass a form with the data and its metadata to the librarians, and it was the librarians who uploaded it to ensure that it was FAIR. Since November, we also allow researchers to upload data directly to the repository, but it is not published until it has been reviewed by expert librarians, who verify that the data and metadata are of high quality. It is very important that the data is well described so that it can be easily found, reusable and identifiable.

As for the challenges, there are all those mentioned by Mireia - that researchers often do not know they have data - and also, although ANECA has helped a lot with the new obligations to publish research data, many researchers want to put their data running in the repositories, without taking into account that they have to be quality data, that it is not enough to put them there, but that it is important that these data can be reused later.

6. What activities and tools do you or similar institutions provide to help organisations succeed in this task?

Juan Corrales: From Consorcio Madroño, the repository itself that we use, the tool where the research data is uploaded, makes it easy to make the data FAIR, because it already provides unique identifiers, fairly comprehensive metadata templates that can be customised, and so on. We also have another tool that helps create the data management plans for researchers, so that before they create their research data, they start planning how they're going to work with it. This is very important and has been promoted by European institutions for a long time, as well as by the Science Act and the National Open Science Strategy.

Then, more than the tools, the review by expert librarians is also very important. There are other tools that help assess the quality of adataset, of research data, such as Fair EVA or F-Uji, but what we have found is that those tools at the end what they are evaluating more is the quality of the repository, of the software that is being used, and of the requirements that you are asking the researchers to upload this metadata, because all our datasets have a pretty high and quite similar evaluation. So what those tools do help us with is to improve both the requirements that we're putting on our datasets, on our datasets, and to be able to improve the tools that we have, in this case the Dataverse software, which is the one we are using.

Mireia Alcalá: At the level of tools and activities we are on a par, because we have had a relationship with the Madroño Consortium for years, and just like them we have all these tools that help and facilitate putting the data in the best possible way right from the start, for example, with the tool for making data management plans. Here at CSUC we have also been working very intensively in recent years to close this gap in the data life cycle, covering issues of infrastructures, storage, cloud, etc. so that, when the data is analysed and managed, researchers also have a place to go. After the repository, we move on to all the channels and portals that make it possible to disseminate and make all this science visible, because it doesn't make sense for us to make repositories and they are there in a silo, but they have to be interconnected. For many years now, a lot of work has been done on making interoperability protocols and following the same standards. Therefore, data has to be available elsewhere, and both Consorcio Madroño and we are everywhere possible and more.

7. Can you tell us a bit more about these repositories you offer? In addition to helping researchers to make their data available to the public, you also offer a space, a digital repository where this data can be housed, so that it can be located by users.

Mireia Alcalá: If we are talking specifically about research data, as we and Consorcio Madroño have the same repository, we are going to let Juan explain the software and specifications, and I am going to focus on other repositories of scientific production that CSUC also offers. Here what we do is coordinate different cooperative repositories according to the type of resource they contain. So, we have TDX for thesis, RECERCAT for research papers, RACO for scientific journals or MACO, for open access monographs. Depending on the type of product, we have a specific repository, because not everything can be in the same place, as each output of the research has different particularities. Apart from the repositories, which are cooperative, we also have other spaces that we make for specific institutions, either with a more standard solution or some more customised functionalities. But basically it is this: we have for each type of output that there is in the research, a specific repository that adapts to each of the particularities of these formats.

Juan Corrales: In the case of Consorcio Madroño, our repository is called e-scienceData, but it is based on the same software as the CSUC repository, which is Dataverse.. It is open source software, so it can be improved and customised. Although in principle the development is managed from Harvard University in the United States, institutions from all over the world are participating in its development - I don't know if thirty-odd countries have already participated in its development.

Among other things, for example, the translations into Catalan have been done by CSUC, the translation into Spanish has been done by Consorcio Madroño and we have also participated in other small developments. The advantage of this software is that it makes it much easier for the data to be FAIR and compatible with other points that have much more visibility, because, for example, the CSUC is much larger, but in the Madroño Consortium there are six universities, and it is rare that someone goes to look for a dataset in the Madroño Consortium, in e-scienceData, directly. They usually search for it via Google or a European or international portal. With these facilities that Dataverse has, they can search for it from anywhere and they can end up finding the data that we have at Consorcio Madroño or at CSUC.

8. What other platforms with open research data, at Spanish or European level, do you recommend?

Juan Corrales: For example, at the Spanish level there is the FECYT, the Spanish Foundation for Science and Technology, which has a box that collects the research data of all Spanish institutions practically. All the publications of all the institutions appear there: Consorcio Madroño, CSUC and many more.

Then, specifically for research data, there is a lot of research that should be put in a thematic repository, because that's where researchers in that branch of science are going to look. We have a tool to help choose the thematic repository. At the European level there is Zenodo, which has a lot of visibility, but does not have the data quality support of CSUC or the Madroño Consortium. And that is something that is very noticeable in terms of reuse afterwards.

Mireia Alcalá: At the national level, apart from Consorcio Madroño's and our own initiatives, data repositories are not yet widespread. We are aware of some initiatives under development, but it is still too early to see their results. However, I do know of some universities that have adapted their institutional repositories so that they can also add data. And while this is a valid solution for those who have no other choice, it has been found that software used in repositories that are not designed to handle the particularities of the data - such as heterogeneity, format, diversity, large size, etc. - are a bit lame. Then, as Juan said, at the European level, it is established that Zenodo is the multidisciplinary and multiformat repository, which was born as a result of a European project of the Commission. I agree with him that, as it is a self-archiving and self-publishing repository - that is, I Mireia Alcalá can go there in five minutes, put any document I have there, nobody has looked at it, I put the minimum metadata they ask me for and I publish it -, it is clear that the quality is very variable. There are some things that are really usable and perfect, but there are others that need a little more TLC. As Juan said, also at the disciplinary level it is important to highlight that, in all those areas that have a disciplinary repository, researchers have to go there, because that is where they will be able to use their most appropriate metadata, where everybody will work in the same way, where everybody will know where to look for those data.... For anyone who is interested there is a directory called re3data, which is basically a directory of all these multidisciplinary and disciplinary repositories. It is therefore a good place for anyone who is interested and does not know what is in their discipline. Let him go there, he is a good resource.

9. What actions do you consider to be priorities for public institutions in order to promote open knowledge?

Mireia Alcalá: What I would basically say is that public institutions should focus on making and establishing clear policies on open science, because it is true that we have come a long way in recent years, but there are times when researchers are a bit bewildered. And apart from policies, it is above all offering incentives to the entire research community, because there are many people who are making the effort to change their way of working to become immersed in open science and sometimes they don't see how all that extra effort they are making to change their way of working to do it this way pays off. So I would say this: policies and incentives.

Juan Corrales: From my point of view, the theoretical policies that we already have at the national level, at the regional level, are usually quite correct, quite good. The problem is that often no attempt has been made to enforce them. So far, from what we have seen for example with ANECA - which has promoted the use of data repositories or research article repositories - they have not really started to be used on a massive scale. In other words, incentives are necessary, and not just a matter of obligation. As Mireia has also said, we have to convince researchers to see open publishing as theirs, as it is something that benefits both them and society as a whole. What I think is most important is that: the awareness of researchers

Interview clips

1. Why should universities and researchers share their studies in open formats?

2. What requirements must an investigation meet in order to be considered open?

23/04/2025

Data Innovation Toolkit: the roadmap for innovating with public data

Noticia

How can public administrations harness the value of data? This question is not a simple one to address; its answer is conditioned by several factors that have to do with the context of each administration, the data available to it and the specific objectives set.

However, there are reference guides that can help define a path to action. One of them is published by the European Commission through the EU Publications Office, Data Innovation Toolkit, which emerges as a strategic compass to navigate this complex data innovation ecosystem.

This tool is not a simple manual as it includes templates to make the implementation of the process easier. Aimed at a variety of profiles, from novice analysts to experienced policy makers and technology innovators, Data Innovation Toolkit is a useful resource that accompanies you through the process, step by step.

It aims to democratise data-driven innovation by providing a structured framework that goes beyond the mere collection of information. In this post, we will analyse the contents of the European guide, as well as the references it provides for good innovative use of data.

Structure covering the data lifecycle

The guide is organised in four main steps, which address the entire data lifecycle.

Planning

The first part of the guide focuses on establishing a strong foundation for any data-driven innovation project. Before embarking on any process, it is important to define objectives. To do so, the Data Innovation Toolkit suggests a deep reflection that requires aligning the specific needs of the project with the strategic objectives of the organisation. In this step, stakeholder mapping is also key. This implies a thorough understanding of the interests, expectations and possible contributions of each actor involved. This understanding enables the design of engagement strategies that maximise collaboration and minimise potential conflicts.

To create a proper data innovation team, we can use the RACI matrix (Responsible, Accountable, Consulted, Informed) to define precise roles and responsibilities. It is not just about bringing professionals together, but about building multidisciplinary teams where each member understands their exact role and contribution to the project. To assist in this task the guide provides:

Challenge definition tool: to identify and articulate the key issues they seek to address, summarising them in a single statement.
Stakeholder mapping tool: to visualise the network of individuals and organisations involved, assessing their influence and interests.
Team definition tool: to make it easier to identify people in your organisation who can help you.
Tool to define roles: to, once the necessary profiles have been defined, determine their responsibilities and role in the data project in more detail, using a RACI matrix.
Tool to define People: People is a concept used to define specific types of users, called behavioural archetypes. This guide helps to create these detailed profiles, which represent the users or clients who will be involved in the project.
Tool for mapping Data Journey: to make a synthetic representation describing step by step how a user can interact with his data. The process is represented from the user's perspective, describing what happens at each stage of the interaction and the touch points.

Collection and processing

Once the team has been set up and the objectives have been identified, a classification of the data is made that goes beyond the traditional division between quantitative and qualitative data.

Quantitative scope:

Discrete data, such as the number of complaints in a public service, represents not only a number, but an opportunity to systematically identify areas for improvement. They allow administrations to map recurrent problems and design targeted interventions. Ongoing data, such as response times for administrative procedures, provide a snapshot of operational efficiency. It is not just a matter of measuring, but of understanding the factors that influence the variability of these times and designing more agile and efficient processes.

Qualitative:

Nominal (name) data enables the categorisation of public services, allowing for a more structured understanding of the diversity of administrative interventions.
Ordinal (number) data, such as satisfaction ratings, become a prioritisation tool for continuous improvement.

A series of checklists are available in the document to review this aspect:

Checklist of data gaps: to identify if there are any gaps in the data to be used and, if so, how to fill them.
Template for data collection: to align the dataset to the objective of the innovative analysis.
Checklist of data collection: to ensure access to the data sources needed to run the project.
Checklist of data quality: to review the quality level of the dataset.
Data processing letters: to check that data is being processed securely, efficiently and in compliance with regulations.

Sharing and analysis

At this point, the Data Innovation Toolkit proposes four analysis strategies that transform data into actionable knowledge.

Descriptive analysis: goes beyond the simple visualisation of historical data, allowing the construction of narratives that explain the evolution of the phenomena studied.
Diagnostic analysis: delves deeper into the investigation of causes, unravelling the hidden patterns that explain the observed behaviours.
Predictive analytics: becomes a strategic planning tool, allowing administrations to prepare for future scenarios.
Prescriptive analysis: goes a step further, not only projecting trends, but recommending concrete actions based on data modelling.

In addition to analysis, the ethical dimension is fundamental. The guide therefore sets out strict protocols to ensure secure data transfers, regulatory compliance, transparency and informed consent. In this section, the following checklistis provided:

Data sharing template: to ensure secure, legal and transparent sharing.
Checklist for data sharing: to perform all the necessary steps to share data securely, ethically and achieving all the defined objectives.
Data analysis template: to conduct a proper analysis to obtain insights useful and meaningful for the project.

Use and evaluation

The last stage focuses on converting the insights into real actions. The communication of results, the definition of key performance indicators (KPIs), impact measurement and scalability strategies become tools for continuous improvement.

A collaborative resource in continuous improvement

In short, the toolkit offers a comprehensive transformation: from evidence-based decision making to personalising public services, increasing transparency and optimising resources. You can also check the checklist available in this section which are:

Checklist for data use: to review that the data and the conclusions drawn are used in an effective, accountable and goal-oriented manner.
Data innovation through KPI tool: to define the KPIs that will measure the success of the process.
Impact measurement and success evaluation tools: to assess the success and impact of the innovation in the data project.
Data innovation scalability plan: to identify strategies to scale the project effectively.

In addition, this repository of innovation resources and data is a dynamic catalogue of knowledge including expertise articles, implementation guides, case studies and learning materials.

You can access here the list of materials provided by the Data Innovation Toolkit.

Web: https://is.gd/datainnovationrepository

You can even contact the development team if you have any questions or would like to contribute to the repository:

Email: DIGIT-DIGITAL-INNOVATION@ec.europa.eu

To conclude, harnessing the value of data with an innovative perspective is not a magic leap, but a gradual and complex process. On this path, the Data Innovation Toolkit can be useful as it offers a structured framework. Effective implementation will require investment in training, cultural adaptation and long-term commitment.

02/04/2025

Podcast: How to learn data science in a self-taught way

Entrevista

Did you know that data science skills are among the most in-demand skills in business? In this podcast, we are going to tell you how you can train yourself in this field, in a self-taught way. For this purpose, we will have two experts in data science:

Juan Benavente, industrial and computer engineer with more than 12 years of experience in technological innovation and digital transformation. In addition, it has been training new professionals in technology schools, business schools and universities for years.
Alejandro Alija, PhD in physics, data scientist and expert in digital transformation. In addition to his extensive professional experience focused on the Internet of Things (internet of things), Alejandro also works as a lecturer in different business schools and universities.

Listen to the full podcast (only available in Spanish)

Summary / Transcript of the interview

1. What is data science? Why is it important and what can it do for us?

Alejandro Alija: Data science could be defined as a discipline whose main objective is to understand the world, the processes of business and life, by analysing and observing data.Data science is a discipline whose main objective is to understand the world, the processes of business and life, by analysing and observing the data.. In the last 20 years it has gained exceptional relevance due to the explosion in data generation, mainly due to the irruption of the internet and the connected world.

Juan Benavente: The term data science has evolved since its inception. Today, a data scientist is the person who is working at the highest level in data analysis, often associated with the building of machine learning or artificial intelligence algorithms for specific companies or sectors, such as predicting or optimising manufacturing in a plant.

The profession is evolving rapidly, and is likely to fragment in the coming years. We have seen the emergence of new roles such as data engineers or MLOps specialists. The important thing is that today any professional, regardless of their field, needs to work with data. There is no doubt that any position or company requires increasingly advanced data analysis. It doesn't matter if you are in marketing, sales, operations or at university. Anyone today is working with, manipulating and analysing data. If we also aspire to data science, which would be the highest level of expertise, we will be in a very beneficial position. But I would definitely recommend any professional to keep this on their radar.

2. How did you get started in data science and what do you do to keep up to date? What strategies would you recommend for both beginners and more experienced profiles?

Alejandro Alija: My basic background is in physics, and I did my PhD in basic science. In fact, it could be said that any scientist, by definition, is a data scientist, because science is based on formulating hypotheses and proving them with experiments and theories. My relationship with data started early in academia. A turning point in my career was when I started working in the private sector, specifically in an environmental management company that measures and monitors air pollution. The environment is a field that is traditionally a major generator of data, especially as it is a regulated sector where administrations and private companies are obliged, for example, to record air pollution levels under certain conditions. I found historical series up to 20 years old that were available for me to analyse. From there my curiosity began and I specialised in concrete tools to analyse and understand what is happening in the world.

Juan Benavente: I can identify with what Alejandro said because I am not a computer scientist either. I trained in industrial engineering and although computer science is one of my interests, it was not my base. In contrast, nowadays, I do see that more specialists are being trained at the university level. A data scientist today has manyskills on their back such as statistics, mathematics and the ability to understand everything that goes on in the industry. I have been acquiring this knowledge through practice. On how to keep up to date, I think that, in many cases, you can be in contact with companies that are innovating in this field. A lot can also be learned at industry or technology events. I started in the smart cities and have moved on to the industrial world to learn little by little.

Alejandro Alija:. To add another source to keep up to date. Apart from what Juan has said, I think it's important to identify what we call outsiders, the manufacturers of technologies, the market players. They are a very useful source of information to stay up to date: identify their futures strategies and what they are betting on.

3. If someone with little or no technical knowledge wants to learn data science, where do they start?

Juan Benavente: In training, I have come across very different profiless: from people who have just graduated from university to profiles that have been trained in very different fields and find in data science an opportunity to transform themselves and dedicate themselves to this. Thinking of someone who is just starting out, I think the best thing to do is put your knowledge into practice. In projects I have worked on, we defined the methodology in three phases: a first phase of more theoretical aspects, taking into account mathematics, programming and everything a data scientist needs to know; once you have those basics, the sooner you start working and practising those skills, the better. I believe that skill sharpens the wit and, both to keep up to date and to train yourself and acquire useful knowledge, the sooner you enter into a project, the better. And even more so in a world that is so frequently updated. In recent years, the emergence of the Generative AI has brought other opportunities. There are also opportunities for new profiles who want to be trained . Even if you are not an expert in programming, you have tools that can help you with programming, and the same can happen in mathematics or statistics.

Alejandro Alija:. To complement what Juan says from a different perspective. I think it is worth highlighting the evolution of the data science profession.. I remember when that paper about "the sexiest profession in the world" became famous and went viral, but then things adjusted. The first settlers in the world of data science did not come so much from computer science or informatics. There were more outsiders: physicists, mathematicians, with a strong background in mathematics and physics, and even some engineers whose work and professional development meant that they ended up using many tools from the computer science field. Gradually, it has become more and more balanced. It is now a discipline that continues to have those two strands: people who come from the world of physics and mathematics towards the more basic data, and people who come with programming skills. Everyone knows what they have to balance in their toolbox. Thinking about a junior profile who is just starting out, I think a very important thing - and we see this when we teach - is programming skills. I would say that having programming skills is not just a plus, but a basic requirement for advancement in this profession. It is true that some people can do well without a lot of programming skills, but I would argue that a beginner needs to have those first programming skills with a basic toolset . We're talking about languages such as Python and R, which are the headline languages. You don't need to be a great coder, but you do need to have some basic knowledge to get started. Then, of course, specific training in the mathematical foundations of data science is crucial. The fundamental statistics and more advanced statistics are complements that, if present, will move a person along the data science learning curve much faster. Thirdly, I would say that specialisation in particular tools is important. Some people are more oriented towards data engineering, others towards the modelling world. Ideally, specialise in a few frameworks and use them together, as optimally as possible.

4. In addition to teaching, you both work in technology companies. What technical certifications are most valued in the business sector and what open sources of knowledge do you recommend to prepare for them?

Juan Benavente: Personally, it's not what I look at most, but I think it can be relevant, especially for people who are starting out and need help in structuring their approach to the problem and understanding it. I recommend certifications of technologies that are in use in any company where you want to end up working. Especially from providers of cloud computing and widespread data analytics tools. These are certifications that I would recommend for someone who wants to approach this world and needs a structure to help them. When you don't have a knowledge base, it can be a bit confusing to understand where to start. Perhaps you should reinforce programming or mathematical knowledge first, but it can all seem a bit complicated. Where these certifications certainly help you is, in addition to reinforcing concepts, to ensure that you are moving well and know the typical ecosystem of tools you will be working with tomorrow. It is not just about theoretical concepts, but about knowing the ecosystems that you will encounter when you start working, whether you are starting your own company or working in an established company. It makes it much easier for you to get to know the typical ecosystem of tools. Call it Microsoft Computing, Amazon or other providers of such solutions. This will allow you to focus more quickly on the work itself, and less on all the tools that surround it. I believe that this type of certification is useful, especially for profiles that are approaching this world with enthusiasm. It will help them both to structure themselves and to land well in their professional destination. They are also likely to be valued in selection processes.

Alejandro Alija: If someone listens to us and wants more specific guidelines, it could be structured in blocks. There are a series of massive online courses that, for me, were a turning point. In my early days, I tried to enrol in several of these courses on platforms such as Coursera, edX, where even the technology manufacturers themselves design these courses. I believe that this kind of massive, self-service, online courses provide a good starting base. A second block would be the courses and certifications of the big technology providers, such as Microsoft, Amazon Web Services, Google and other platforms that are benchmarks in the world of data. These companies have the advantage that their learning paths are very well structured, which facilitates professional growth within their own ecosystems. Certifications from different suppliers can be combined. For a person who wants to go into this field, the path ranges from the simplest to the most advanced certifications, such as being a data solutions architect or a specialist in a specific data analytics service or product. These two learning blocks are available on the internet, most of them are open and free or close to free. Beyond knowledge, what is valued is certification, especially in companies looking for these professional profiles.

5. In addition to theoretical training, practice is key, and one of the most interesting methods of learning is to replicate exercises step by step. In this sense, from datos.gob.es we offer didactic resources, many of them developed by you as experts in the project, can you tell us what these exercises consist of?. How are they approached?

Alejandro Alija: The approach we always took was designed for a broad audience, without complex prerequisites. We wanted any user of the portal to be able to replicate the exercises, although it is clear that the more knowledge you have, the more you can use it to your advantage. Exercises have a well-defined structure: a documentary section, usually a content post or a report describing what the exercise consists of, what materials are needed, what the objectives are and what it is intended to achieve. In addition, we accompany each exercise with two additional resources. The first resource is a code repository where we upload the necessary materials, with a brief description and the code of the exercise. It can be a Python notebook , a Jupyter Notebook or a simple script, where the technical content is. And then another fundamental element that we believe is important and that is aimed at facilitating the execution of the exercises. In data science and programming, non-specialist users often find it difficult to set up a working environment. A Python exercise, for example, requires having a programming environment installed, knowing the necessary libraries and making configurations that are trivial for professionals, but can be very complex for beginners. To mitigate this barrier, we publish most of our exercises on Google Colab, a wonderful and open tool. Google Colab is a web programming environment where the user only needs a browser to access it. Basically, Google provides us with a virtual computer where we can run our programmes and exercises without the need for special configurations. The important thing is that the exercise is ready to use and we always check it in this environment, which makes it much easier to learn for beginners or less technically experienced users.

Juan Benavente: Yes, we always take a user-oriented approach, step by step, trying to make it open and accessible. The aim is for anyone to be able to run an exercise without the need for complex configurations, focusing on topics as close to reality as possible. We often take advantage of open data published by entities such as the DGT or other bodies to make realistic analyses. We have developed very interesting exercises, such as energy market predictions, analysis of critical materials for batteries and electronics, which allow learning not only about technology, but also about the specific subject matter.. You can get down to work right away, not only to learn, but also to find out about the subject.

6. In closing, we'd like you to offer a piece of advice that is more attitude-oriented than technical, what would you say to someone starting out in data science?

Alejandro Alija: As for an attitude tip for someone starting out in data science, I suggest be brave. There is no need to worry about being unprepared, because in this field everything is to be done and anyone can contribute value. Data science is multi-faceted: there are professionals closer to the business world who can provide valuable insights, and others who are more technical and need to understand the context of each area. My advice is to be content with the resources available without panicking, because, although the path may seem complex, the opportunities are very high. As a technical tip, it is important to be sensitive to the development and use of data. The more understanding one has of this world, the smoother the approach to projects will be.

Juan Benavente: I endorse the advice to be brave and add a reflection on programming: many people find the theoretical concept attractive, but when they get to practice and see the complexity of programming, some are discouraged by lack of prior knowledge or different expectations. It is important to add the concepts of patience and perseverance. When you start in this field, you are faced with multiple areas that you need to master: programming, statistics, mathematics, and specific knowledge of the sector you will be working in, be it marketing, logistics or another field. The expectation of becoming an expert quickly is unrealistic. It is a profession that, although it can be started without fear and by collaborating with professionals, requires a journey and a learning process. You have to be consistent and patient, managing expectations appropriately. Most people who have been in this world for a long time agree that they have no regrets about going into data science. It is a very attractive profession where you can add significant value, with an important technological component. However, the path is not always straightforward. There will be complex projects, moments of frustration when analyses do not yield the expected results or when working with data proves more challenging than expected. But looking back, few professionals regret having invested time and effort in training and developing in this field. In summary, the key tips are: courage to start, perseverance in learning and development of programming skills.

Interview clips

1. Is it worth studying data science?

2. How are the data science exercises on datos.gob.es approached?

3. What is data science? What skills are required?

31/03/2025

OpenEuroLLM: a European open source AI language models project

Noticia

There is no doubt that artificial intelligence has become a fundamental pillar of technological innovation. Today, artificial intelligence (AI) can create chatbots specialised in open data, applications that facilitate professional work and even a digital Earth model to anticipate natural disasters.

The possibilities are endless, however, the future of AI also has challenges to overcome to make models more inclusive, accessible and transparent. In this respect, the European Union is developing various initiatives to make progress in this field.

European regulatory framework for a more open and transparent AI.

The EU's approach to AI seeks to give citizens the confidence to adopt these technologies and to encourage businesses to develop them. To this end, the European AI Regulation sets out guidelines for the development of artificial intelligence in line with European values of privacy, security and cultural diversity. On the other hand, the Data Governance Regulation (DGA) defines that broad access to data must be guaranteed without compromising intellectual property rights, privacy and fairness.

Together with the Artificial Intelligence Act, the update of the Coordinated Plan on AI ensures the security and fundamental rights of individuals and businesses, while strengthening investment and innovation in all EU countries. The Commission has also launched an Artificial Intelligence Innovation Package to help European start-ups and SMEs develop reliable AI that respects EU values and standards.

Other institutions are also working on boosting intelligence by pushing open source AI models as a very interesting solution. A recent report by Open Future and Open Source Initiative (OSI) defines what data governance should look like in open source AI models. One of the challenges highlighted in the report is precisely to strike a balance between open data and data rights, to achieve more transparency and to avoid cultural bias. In fact, experts in the field Ricard Martínez and Carmen Torrijos debated this issue in the pódcast of datos.gob.es.

The OpenEuroLLM project

With the aim of solving potential challenges and as an innovative and open solution, the European Union, through the Digital Europe programme has presented through this open source artificial intelligence project it is expected to create efficient, transparent language models aligned with European AI regulations.

The OpenEuroLLM project has as its main goal the development of state-of-the-art language models for a wide variety of public and private applications. Among the most important objectives, we can mention the following:

Extend the multilingual capabilities of existing models: this includes not only the official languages of the European Union, but also other languages that are of social and economic interest. Europe is a continent rich in linguistic diversity, and the project seeks to reflect this diversity in AI models.
Sustainable access to fundamental models: lthe models developed within the project will be easy to access and ready to be adjusted to various applications. This will not only benefit large enterprises, but also small and medium-sized enterprises (SMEs) that wish to integrate AI into their processes without facing technological barriers.
Evaluation of results and alignment with European regulations: models will be evaluated according to rigorous safety standards and alignment with the European AI Regulation and other European regulatory frameworks. This will ensure that AI solutions are safe and respect fundamental rights.
Transparency and accessibility: One of the premises of the project is to openly share the tools, processes and intermediate results of the training processes. This will allow other researchers and developers to reproduce, improve and adapt the models for their own purposes.
Community building: OpenEuroLLM is not limited to modelling but also aims to build an active and engaged community, both in the public and private sector, that can collaborate, share knowledge and work together to advance AI research.

The OpenEuroLLM Consortium: a collaborative and multinational project

The OpenEuroLLM project is being developed by a consortium of 20 European research institutions , technology companies and supercomputing centres, under the coordination of Charles University (Czech Republic) and the collaboration of Silo GenAI (Finland). The consortium brings together some of the leading institutions and companies in the field of artificial intelligence in Europe, creating a multinational collaboration to develop open source language models.

The main institutions participating in the project include renowned universities such as University of Helsinki (Finland) and University of Oslo (Norway), as well as technology companies such as Aleph Alpha Research (Germany) or the company from Elche prompsit (Spain), among others. In addition, supercomputing centres such as the Barcelona Supercomputing Center (Spain) or SURF (The Netherlands) provide the infrastructure needed to train large-scale models.

Linguistic diversity, transparency and compliance with EU standards

One of the biggest challenges of globalised artificial intelligence is the inclusion of multiple languages and the preservation of cultural differences. Europe, with its vast linguistic diversity, presents a unique environment in which to address these issues. OpenEuroLLM is committed to preserving this diversity and ensuring that the AI models developed are sensitive to the linguistic and cultural variations of the region.

As we saw at the beginning of this post, technological development must go hand in hand with ethical and responsible values. In this respect, one of the key features of the OpenEuroLLM project is its focus on transparency. Models, data, documentation, training code and evaluation metrics will be fully available to the public. This will allow researchers and developers to audit, modify and improve the models, ensuring an open and collaborative approach.

In addition, the project is aligned with strict European AI regulations. OpenEuroLLM is designed to comply with the EU's AI Law , which sets stringent criteria to ensure safety, fairness and privacy in artificial intelligence systems.

Democratising access to AI

One of the most important achievements of OpenEuroLLLM is the democratisation of access to high-performance AI. Open source models will enable businesses, academic institutions and public sector organisations across Europe to have access to cutting-edge technology, regardless of their size or budget.

This is especially relevant for small and medium-sized enterprises (SMEs), which often face difficulties in accessing AI solutions due to high licensing costs or technological barriers. OpenEuroLLM will remove these barriers and enable companies to develop innovative products and services using AI, which will contribute to Europe's economic growth.

The OpenEuroLLM project is also an EU commitment to digital sovereignty that is strategically investing in the development of technological infrastructure that reduces dependence on global players and strengthens European competitiveness in the field of artificial intelligence. This is an important step towards artificial intelligence that is not only more advanced, but also fairer, safer and more responsible.

11/03/2025

The importance of data training for public sector workers

Blog

There is no doubt that digital skills training is necessary today. Basic digital skills are essential to be able to interact in a society where technology already plays a cross-cutting role. In particular, it is important to know the basics of the technology for working with data.

In this context, public sector workers must also keep themselves constantly updated. Training in this area is key to optimising processes, ensuring information security and strengthening trust in institutions.

In this post, we identify digital skills related to open data aimed at both publishing and using open data. Not only did we identify the professional competencies that public employees working with open data must have and maintain, we also compiled a series of training resources that are available to them.

Professional competencies for working with data

A working group was set up in 2024 National Open Data Gathering with one objective: to identify the digital competencies required of public administration professionals working with open data. Beyond conclusions of this event of national relevance, the working group defined profiles and roles needed for data opening, gathering information on their roles and the skills and knowledge required. The main roles identified were:

Role responsible: has technical responsibility for the promotion of open data policies and organises activities to define policies and data models. Some of the skills required are:
- Leadership in promoting strategies to drive data openness.
- Driving the data strategy to drive openness with purpose.
- Understand the regulatory framework related to data in order to act within the law throughout the data lifecycle.
- Encourage the use of tools and processes for data management.
- Ability to generate synergies in order to reach a consensus on cross-cutting instructions for the entire organisation.
Technical role of data entry technician (ICT profile): carries out implementation activities more closely linked to the management of systems, extraction processes, data cleansing, etc. EThis profile must have knowledge of, for example:
- How to structure the dataset, the metadata vocabulary, data quality, strategy to follow...
- Be able to analyse a dataset and identify debugging and cleaning processes quickly and intuitively.
- Generate data visualisations, connecting databases of different formats and origins to obtain dynamic and interactive graphs, indicators and maps.
- Master the functionalities of the platform, i.e. know how to apply technological solutions for open data management or know techniques and strategies to access, extract and integrate data from different platforms.
Open data functional role (technician of a service): executes activities more related to the selection of data to be published, quality, promotion of open data, visualisation, data analytics, etc. For example:
- Handling visualisation and dynamisation tools.
- Knowing the data economy and knowing the information related to data in its full extent (generation by public administrations, open data, infomediaries, reuse of public information, Big Data, Data Driven, roles involved, etc.).
- To know and apply the ethical and personal data protection aspects that apply to the opening of data.
Data use by public workers: this profile carries out activities on the use of data for decision making, basic data analytics, among others. In order to do so, it must have these competences:
- Navigation, search and filtering of data.
- Data assessment.
- Data storage and export
- Data analysis and exploitation.

In addition, as part of this challenge to increase capacities for open data, a list of free trainings and guides on open data and data analyticswas developed. We compile some of them that are available online and in open format.

Institution	Resources	Link	Level
Knight Center for Journalism in the Americas	Data journalism and visualisation with free tools	https://journalismcourses.org/es/course/dataviz/	Beginner
Data Europa Academy	Introduction to open data	https://data.europa.eu/en/academy/introducing-open-data	Beginner
Data Europa Academy	Understanding the legal side of open data	https://data.europa.eu/en/academy/understanding-legal-side-open-data	Beginner
Data Europa Academy	Improve the quality of open data and metadata	https://data.europa.eu/en/academy/improving-open-data-and-metadata-quality	Advanced
Data Europa Academy	Measuring success in open data initiatives	https://data.europa.eu/en/training/elearning/measuring-success-open-data-initiatives	Advanced
Escuela de Datos	Data Pipeline Course	https://escueladedatos.online/curso/curso-tuberia-de-datos-data-pipeline/	Intermediate
FEMP	Strategic guidance for its implementation - Minimum data sets to be published	https://redtransparenciayparticipacion.es/download/guia-estrategica-para-su-puesta-en-marcha-conjuntos-de-datos-minimos-a-publicar/	Intermediate
Datos.gob.es	Methodological guidelines for data opening	/es/conocimiento/pautas-metodologicas-para-la-apertura-de-datos	Beginner
Datos.gob.es	Practical guide to publishing open data using APIs	/es/conocimiento/guia-practica-para-la-publicacion-de-datos-abiertos-usando-apis	Intermediate
Datos.gob.es	Practical guide to publishing spatial data	/es/conocimiento/guia-practica-para-la-publicacion-de-datos-espaciales	Intermediate
Junta de Andalucía	Processing datasets with Open Refine	https://www.juntadeandalucia.es/datosabiertos/portal/tutoriales/usar-openrefine.html	Beginner

Figure 1. Table of own elaboration with training resources. Source: https://encuentrosdatosabiertos.es/wp-content/uploads/2024/05/Reto-2.pdf

INAP''s continuing professional development training offer

The Instituto Nacional de Administración Pública (INAP) has a Training Activities Programme for 2025, framed in the INAP Learning Strategy 2025-2028.. This training catalogue includes more than 180 activities organised in different learning programmes, which will take place throughout the year with the aim of strengthening the competences of public staff in key areas such as open data management and the use of related technologies.

INAP''s 2025 training programme offers a wide range of courses aimed at improving digital skills and open data literacy. Some of the highlighted trainings include:

Fundamentals and tools of data analysis.
Introduction to Oracle SQL.
Open data and re-use of information.
Data analysis and visualisation with Power BI.
Blockchain: technical aspects.
Advanced Python programming.

These courses, aimed at different profiles of public employees, from open data managers to information management technicians, allow to acquire knowledge on data extraction, processing and visualisation, as well as on strategies for the opening and reuse of open data in the Public Administration. You can consult the full catalogue here..

Other training references

Some public administrations or entities offer training courses related to open data. For more information on its training offer, please see the catalogue with the programmed courses on offer.

FEMP''s Network of Local Entities for Transparency and Citizen Participation: https://redtransparenciayparticipacion.es/.
Government of Aragon: Aragon Open Data: https://opendata.aragon.es/informacion/eventos-de-datos-abiertos
School of Public Administration of Catalonia (EAPC): https://eapc.gencat.cat/ca/inici/index.html#googtrans(ca|es
Diputació de Barcelona: http://aplicacions.diba.cat/gestforma/public/cercador_baf_ens_locals
Instituto Geográfico Nacional (IGN): https://cursos.cnig.es/

In short, training in digital skills, in general, and in open data, in particular, is a practice that we recommend at datos.gob.es. Do you need a specific training resource? Write to us in comments, we''ll read you!

19/02/2025

Data and technology books to give as Christmas presents

Blog

As we do every year, the datos.gob.es team wishes you happy holidays. If this Christmas you feel like giving or giving yourself a gift of knowledge, we bring you our traditional Christmas letter with ideas to ask Father Christmas or the Three Wise Men.

We have a selection of books on a variety of topics such as data protection, new developments in AI or the great scientific discoveries of the 20th century. All these recommendations, ranging from essays to novels, will be a sure hit to put under the tree.

Maniac by Benjamin Labatut.

What is it about? Guided by the figure of John von Neumann, one of the great geniuses of the 20th century, the book covers topics such as the creation of atomic bombs, the Cold War, the birth of the digital universe and the rise of artificial intelligence. The story begins with the tragic suicide of Paul Ehrenfest and progresses through the life of von Neumann, who foreshadowed the arrival of a technological singularity. The book culminates in a confrontation between man and machine in an epic showdown in the game of Go, which serves as a warning about the future of humanity and its creations.
Who is it aimed at? This science fiction novel is aimed at anyone interested in the history of science, technology and its philosophical and social implications. Es ideal para quienes disfrutan de narrativas que combinan el thriller con profundas reflexiones sobre el futuro de la humanidad y el avance tecnológico. It is also suitable for those looking for a literary work that delves into the limits of thought, reason and artificial intelligence.

Take control of your data, by Alicia Asin.

What is it about? This book compiles resources to better understand the digital environment in which we live, using practical examples and clear definitions that make it easier for anyone to understand how technologies affect our personal and social lives. It also invites us to be more aware of the consequences of the indiscriminate use of our data, from the digital trail we leave behind or the management of our privacy on social networks, to trading on the dark web. It also warns about the legitimate but sometimes invasive use of our online behaviour by many companies.
Who is it aimed at? The author of this book is CEO of the data reuse company Libelium who participated in one of our Encuentros Aporta and is a leading expert on privacy, appropriate use of data and data spaces, among others. In this book, the author offers a business perspective through a work aimed at the general public.

Governance, management and quality of artificial intelligence by Mario Geraldo Piattini.

What is it about? Artificial intelligence is increasingly present in our daily lives and in the digital transformation of companies and public bodies, offering both benefits and potential risks. In order to benefit properly from the advantages of AI and avoid problems it is very important to have ethical, legal and responsible systems in place. This book provides an overview of the main standards and tools for managing and assuring the quality of intelligent systems. To this end, it provides clear examples of best available practices.
Who is it aimed at? Although anyone can read it, the book provides tools to help companies meet the challenges of AI by creating systems that respect ethical principles and align with engineering best practices.

Nexus, by Yuval Noah.

What is it about? In this new installment, one of the most fashionable writers analyzes how information networks have shaped human history, from the Stone Age to the present era. This essay explores the relationship between information, truth, bureaucracy, mythology, wisdom and power, and how different societies have used information to impose order, with both positive and negative consequences. In this context, the author discusses the urgent decisions we must make in the face of current threats, such as the impact of non-human intelligence on our existence.
Who is it aimed at? It is a mainstream work, i.e. anyone can read it and will most likely enjoy reading it. It is a particularly attractive option for readers seeking to reflect on the role of information in modern society and its implications for the future of humanity, in a context where emerging technologies such as artificial intelligence are challenging our way of life.

Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play by David Foster (second edition 2024)

What is it about? This practical book dives into the fascinating world of generative deep learning, exploring how machines can create art, music and text. Throughout, Foster guides us through the most innovative architectures such as VAEs, GANs and broadcasting models, explaining how these technologies can transform photographs, generate music and even write text. The book starts with the basics of deep learning and progresses to cutting-edge applications, including image creation with Stable Diffusion, text generation with GPT and music composition with MuSEGAN. It is a work that combines technical rigour with artistic creativity.
Who is it aimed at? This technical manual is intended for machine learning engineers, data scientists and developers who want to enter the field of generative deep learning. It is ideal for those who already have a background in programming and machine learning, and wish to explore how machines can create original content. It will also be valuable for creative professionals interested in understanding how AI can amplify their artistic capabilities. The book strikes the perfect balance between mathematical theory and practical implementation, making complex concepts accessible through concrete examples and working code.

Information is beautiful, by David McCandless.

What is it about? Esta guía visual en inglés nos ayuda a entender cómo funciona el mundo a través de impactantes infografías y visualizaciones de datos. This new edition has been completely revised, with more than 20 updates and 20 new visualisations. It presents information in a way that is easy to skim, but also invites further exploration.
Who is it aimed at? This book is aimed at anyone interested in seeing and understanding information in a different way. It is perfect for those looking for an innovative and visually appealing way to understand the world around us. It is also ideal for those who enjoy exploring data, facts and their interrelationships in an entertaining and accessible way.

Collecting Field Data with QGIS and Mergin Maps, de Kurt Menke y Alexandra Bucha Rasova.

What is it about? This book teaches you how to master the Mergin Maps platform for collecting, sharing and managing field data using QGIS. The book covers everything from the basics, such as setting up projects in QGIS and conducting field surveys, to advanced workflows for customising projects and managing collaborations. In addition, details on how to create maps, set up survey layers and work with smart forms for data collection are included.
Who is it aimed at? Although it is a somewhat more technical option than the previous proposals, the book is aimed at new users of Mergin Maps and QGIS. It is also useful for those who are already familiar with these tools and are looking for more advanced workflows.

A terrible greenery by Benjamin Labatut.

What is it about? This book is a fascinating blend of science and literature, narrating scientific discoveries and their implications, both positive and negative. Through powerful stories, such as the creation of Prussian blue and its connection to chemical warfare, the mathematical explorations of Grothendieck and the struggle between scientists like Schrödinger and Heisenberg, the author, Benjamin Labatut, leads us to explore the limits of science, the follies of knowledge and the unintended consequences of scientific breakthroughs. The work turns science into literature, presenting scientists as complex and human characters.
Who is it aimed at? The book is aimed at a general audience interested in science, the history of discoveries and the human stories behind them, with a focus on those seeking a literary and in-depth approach to scientific topics. It is ideal for those who enjoy works that explore the complexity of knowledge and its effects on the world.

Designing Better Maps: A Guide for GIS Users, de Cynthia A. Brewer.

What is it about? It is a guide in English written by the expert cartographer that teaches how to create successful maps using any GIS or illustration tool. Through its 400 full-colour illustrations, the book covers the best cartographic design practices applied to both reference and statistical maps. Topics include map planning, using base maps, managing scale and time, explaining maps, publishing and sharing, using typography and labels, understanding and using colour, and customising symbols.
Who is it aimed at? This book is intended for all geographic information systems (GIS) users, from beginners to advanced cartographers, who wish to improve their map design skills.

Although in the post we link many purchase links. If you are interested in any of these options, we encourage you to ask your local bookshop to support small businesses during the festive season. Do you know of any other interesting titles? Write it in comments or send it to dinamizacion@datos.gob.es. We read you!

18/12/2024

Education

Sectores

04/12/2024

Citizen science projects that encourage public knowledge

Blog

Citizen science is consolidating itself as one of the most relevant sources of most relevant sources of reference in contemporary research contemporary research. This is recognised by the Centro Superior de Investigaciones Científicas (CSIC), which defines citizen science as a methodology and a means for the promotion of scientific culture in which science and citizen participation strategies converge.

We talked some time ago about the importance importance of citizen science in society in society. Today, citizen science projects have not only increased in number, diversity and complexity, but have also driven a significant process of reflection on how citizens can actively contribute to the generation of data and knowledge.

To reach this point, programmes such as Horizon 2020, which explicitly recognised citizen participation in science, have played a key role. More specifically, the chapter "Science with and for society"gave an important boost to this type of initiatives in Europe and also in Spain. In fact, as a result of Spanish participation in this programme, as well as in parallel initiatives, Spanish projects have been increasing in size and connections with international initiatives.

This growing interest in citizen science also translates into concrete policies. An example of this is the current Spanish Strategy for Science, Technology and Innovation (EECTI), for the period 2021-2027, which includes "the social and economic responsibility of R&D&I through the incorporation of citizen science" which includes "the social and economic responsibility of I through the incorporation of citizen science".

In short, we commented some time agoin short, citizen science initiatives seek to encourage a more democratic sciencethat responds to the interests of all citizens and generates information that can be reused for the benefit of society. Here are some examples of citizen science projects that help collect data whose reuse can have a positive impact on society:

AtmOOs Academic Project: Education and citizen science on air pollution and mobility.

In this programme, Thigis developed a citizen science pilot on mobility and the environment with pupils from a school in Barcelona's Eixample district. This project, which is already replicable in other schoolsconsists of collecting data on student mobility patterns in order to analyse issues related to sustainability.

On the website of AtmOOs Academic you can visualise the results of all the editions that have been carried out annually since the 2017-2018 academic year and show information on the vehicles used by students to go to class or the emissions generated according to school stage.

WildINTEL: Research project on life monitoring in Huelva

The University of Huelva and the State Agency for Scientific Research (CSIC) are collaborating to build a wildlife monitoring system to obtain essential biodiversity variables. To do this, remote data capture photo-trapping cameras and artificial intelligence are used.

The wildINTEL project project focuses on the development of a monitoring system that is scalable and replicable, thus facilitating the efficient collection and management of biodiversity data. This system will incorporate innovative technologies to provide accurate and objective demographic estimates of populations and communities.

Through this project which started in December 2023 and will continue until December 2026, it is expected to provide tools and products to improve the management of biodiversity not only in the province of Huelva but throughout Europe.

IncluScience-Me: Citizen science in the classroom to promote scientific culture and biodiversity conservation.

This citizen science project combining education and biodiversity arises from the need to address scientific research in schools. To do this, students take on the role of a researcher to tackle a real challenge: to track and identify the mammals that live in their immediate environment to help update a distribution map and, therefore, their conservation.

IncluScience-Me was born at the University of Cordoba and, specifically, in the Research Group on Education and Biodiversity Management (Gesbio), and has been made possible thanks to the participation of the University of Castilla-La Mancha and the Research Institute for Hunting Resources of Ciudad Real (IREC), with the collaboration of the Spanish Foundation for Science and Technology - Ministry of Science, Innovation and Universities.

The Memory of the Herd: Documentary corpus of pastoral life.

This citizen science project which has been active since July 2023, aims to gather knowledge and experiences from sheperds and retired shepherds about herd management and livestock farming.

The entity responsible for the programme is the Institut Català de Paleoecologia Humana i Evolució Social, although the Museu Etnogràfic de Ripoll, Institució Milà i Fontanals-CSIC, Universitat Autònoma de Barcelona and Universitat Rovira i Virgili also collaborate.

Through the programme, it helps to interpret the archaeological record and contributes to the preservation of knowledge of pastoral practice. In addition, it values the experience and knowledge of older people, a work that contributes to ending the negative connotation of "old age" in a society that gives priority to "youth", i.e., that they are no longer considered passive subjects but active social subjects.

Plastic Pirates Spain: Study of plastic pollution in European rivers.

It is a citizen science project which has been carried out over the last year with young people between 12 and 18 years of age in the communities of Castilla y León and Catalonia aims to contribute to generating scientific evidence and environmental awareness about plastic waste in rivers.

To this end, groups of young people from different educational centres, associations and youth groups have taken part in sampling campaigns to collect data on the presence of waste and rubbish, mainly plastics and microplastics in riverbanks and water.

In Spain, this project has been coordinated by the BETA Technology Centre of the University of Vic - Central University of Catalonia together with the University of Burgos and the Oxygen Foundation. You can access more information on their website.

Here are some examples of citizen science projects. You can find out more at the Observatory of Citizen Science in Spain an initiative that brings together a wide range of educational resources, reports and other interesting information on citizen science and its impact in Spain. do you know of any other projects? Send it to us at dinamizacion@datos.gob.es and we can publicise it through our dissemination channels.

18/09/2024

The importance of data literacy in the context of digital rights and the rise of artificial intelligence

Blog

Data literacy has become a crucial issue in the digital age. This concept refers to the ability of people to understand how data is used, how it is accessed, created, analysed, used or reused, and communicated.

We live in a world where data and algorithms influence everyday decisions and the opportunities people have to live well. Its effect can be felt in areas ranging from advertising and employment provision to criminal justice and social welfare. It is therefore essential to understand how data is generated and used.

Data literacy can involve many areas, but we will focus on its relationship with digital rights on the one hand and Artificial Intelligence (AI) on the other. This article proposes to explore the importance of data literacy for citizenship, addressing its implications for the protection of individual and collective rights and the promotion of a more informed and critical society in a technological context where artificial intelligence is becoming increasingly important.

The context of digital rights

More and more studies studies increasingly indicate that effective participation in today's data-driven, algorithm-driven society requires data literacy indicating that effective participation in today's data-driven, algorithm-driven society requires data literacy. Civil rights are increasingly translating into digital rights as our society becomes more dependent on digital technologies and environments digital rights as our society becomes more dependent on digital technologies and environments. This transformation manifests itself in various ways:

On the one hand, rights recognised in constitutions and human rights declarations are being explicitly adapted to the digital context. For example, freedom of expression now includes freedom of expression online, and the right to privacy extends to the protection of personal data in digital environments. Moreover, some traditional civil rights are being reinterpreted in the digital context. One example of this is the right to equality and non-discrimination, which now includes protection against algorithmic discrimination and against bias in artificial intelligence systems. Another example is the right to education, which now also extends to the right to digital education. The importance of digital skills in society is recognised in several legal frameworks and documents, both at national and international level, such as the Organic Law 3/2018 on Personal Data Protection and Guarantee of Digital Rights (LOPDGDD) in Spain. Finally, the right of access to the internet is increasingly seen as a fundamental right, similar to access to other basic services.
On the other hand, rights are emerging that address challenges unique to the digital world, such as the right to be forgotten (in force in the European Union and some other countries that have adopted similar legislation¹), which allows individuals to request the removal of personal information available online, under certain conditions. Another example is the right to digital disconnection (in force in several countries, mainly in Europe²), which ensures that workers can disconnect from work devices and communications outside working hours. Similarly, there is a right to net neutrality to ensure equal access to online content without discrimination by service providers, a right that is also established in several countries and regions, although its implementation and scope may vary. The EU has regulations that protect net neutrality, including Regulation 2015/2120, which establishes rules to safeguard open internet access. The Spanish Data Protection Act provides for the obligation of Internet providers to provide a transparent offer of services without discrimination on technical or economic grounds. Furthermore, the right of access to the internet - related to net neutrality - is recognised as a human right by the United Nations (UN).

This transformation of rights reflects the growing importance of digital technologies in all aspects of our lives.

The context of artificial intelligence

The relationship between AI development and data is fundamental and symbiotic, as data serves as the basis for AI development in a number of ways:

Data is used to train AI algorithms, enabling them to learn, detect patterns, make predictions and improve their performance over time.
The quality and quantity of data directly affect the accuracy and reliability of AI systems. In general, more diverse and complete datasets lead to better performing AI models.
The availability of data in various domains can enable the development of AI systems for different use cases.

Data literacy has therefore become increasingly crucial in the AI era, as it forms the basis for effectively harnessing and understanding AI technologies.

In addition, the rise of big data and algorithms has transformed the mechanisms of participation, presenting both challenges and opportunities. Algorithms, while they may be designed to be fair, often reflect the biases of their creators or the data they are trained on. This can lead to decisions that negatively affect vulnerable groups.

In this regard, legislative and academic efforts are being made to prevent this from happening. For example, the EuropeanArtificial Intelligence Act (AI Act) includes safeguards to avoid harmful biases in algorithmic decision-making. For example, it classifies AI systems according to their level of potential risk and imposes stricter requirements on high-risk systems. In addition, it requires the use of high quality data to train the algorithms, minimising bias, and provides for detailed documentation of the development and operation of the systems, allowing for audits and evaluations with human oversight. It also strengthens the rights of persons affected by AI decisions, including the right to challenge decisions made and their explainability, allowing affected persons to understand how a decision was reached.

The importance of digital literacy in both contexts

Data literacy helps citizens make informed decisions and understand the full implications of their digital rights, which are also considered, in many respects, as mentioned above, to be universal civil rights. In this context, data literacy serves as a critical filter for full civic participation that enables citizens to influence political and social decisions full civic participation that enables citizens to influence political and social decisions. That is,those who have access to data and the skills and tools to navigate the data infrastructure effectively can intervene and influencepolitical and social processes in a meaningful way , something which promotes the Open Government Partnership.

On the other hand, data literacy enables citizens to question and understand these processes, fostering a culture of accountability and transparency in the use of AI. There arealso barriers to participation in data-driven environments. One of these barriers is the digital divide (i.e. deprivation of access to infrastructure, connectivity and training, among others) and, indeed, lack of data literacy. The latter is therefore a crucial concept for overcoming the challenges posed by datification datification of human relations and the platformisation of content and services.

Recommendations for implementing a preparedness partnership

Part of the solution to addressing the challenges posed by the development of digital technology is to include data literacy in educational curricula from an early age.

This should cover:

Data basics: understanding what data is, how it is collected and used.
Critical analysis: acquisition of the skills to evaluate the quality and source of data and to identify biases in the information presented. It seeks to recognise the potential biases that data may contain and that may occur in the processing of such data, and to build capacity to act in favour of open data and its use for the common good.
Rights and regulations: information on data protection rights and how European laws affect the use of AI. This area would cover all current and future regulation affecting the use of data and its implication for technology such as AI.
Practical applications: the possibility of creating, using and reusing open data available on portals provided by governments and public administrations, thus generating projects and opportunities that allow people to work with real data, promoting active, contextualised and continuous learning.

By educating about the use and interpretation of data, it fosters a more critical society that is able to demand accountability in the use of AI. New data protection laws in Europe provide a framework that, together with education, can help mitigate the risks associated with algorithmic abuse and promote ethical use of technology. In a data-driven society, where data plays a central role, there is a need to foster data literacy in citizens from an early age.

¹The right to be forgotten was first established in May 2014 following a ruling by the Court of Justice of the European Union. Subsequently, in 2018, it was reinforced with the General Data Protection Regulation (GDPR)which explicitly includes it in its Article 17 as a "right of erasure". In July 2015, Russia passed a law allowing citizens to request the removal of links on Russian search engines if the information"violates Russian law or if it is false or outdated". Turkey has established its own version of the right to be forgotten, following a similar model to that of the EU. Serbia has also implemented a version of the right to be forgotten in its legislation. In Spain, the Ley Orgánica de Protección de Datos Personales (LOPD) regulates the right to be forgotten, especially with regard to debt collection files. In the United Statesthe right to be forgotten is considered incompatible with the Constitution, mainly because of the strong protection of freedom of expression. However, there are some related regulations, such as the Fair Credit Reporting Act of 1970, which allows in certain situations the deletion of old or outdated information in credit reports.

²Some countries where this right has been established include Spain, regulated by Article 88 of Organic Law 3/2018 on Personal Data Protection; France, which, in 2017, became the first country to pass a law on the right to digital disconnection; Germany, included in the Working Hours and Rest Time Act(Arbeitszeitgesetz); Italy, under Law 81/201; and Belgium. Outside Europe, it is, for example, in Chile.

Content prepared by Miren Gutiérrez, PhD and researcher at the University of Deusto, expert in data activism, data justice, data literacy and gender disinformation. The contents and views reflected in this publication are the sole responsibility of the author.

13/09/2024

1. Data Analysis and Visualisation: Practical Statistics with R and Artificial Intelligence. National University of Distance Education (UNED).

2. Big Data. Data analysis and automatic learning with Python. Complutense University.

3. Challenges in Data Science: Big Data, Biostatistics, Artificial Intelligence and Communications. University of Valencia.

4. Digital twins: from simulation to intelligent reality. University of Castilla-La Mancha.

5. Health Geography and Geographic Information Systems: practical applications. University of Zaragoza.

6. Deep into data science. University of Cantabria.

7. Advanced Programming. Autonomous University of Barcelona.

8. Data visualisation and analysis with R. Universidade de Santiago de Compostela.

9. Fundamentals of artificial intelligence: generative models and advanced applications. International University of Andalusia.

10. IA Generative AI to innovate in the company: real cases and tools for its implementation. University of the Basque Country.

Summary / Transcript of the interview

1. Can you briefly explain what the institutions you work for do?

2. What are the requirements for an investigation to be considered open?

3. Why is it important for universities and research centres to make their studies and data available to the public?

4.What research should be published openly?

5. Among other issues, both CSUC and Consorcio Madroño are in charge of supporting entities and researchers who want to make their data available to the public. How should a process of opening research data be? What are the most common challenges and how do you solve them?

6. What activities and tools do you or similar institutions provide to help organisations succeed in this task?

7. Can you tell us a bit more about these repositories you offer? In addition to helping researchers to make their data available to the public, you also offer a space, a digital repository where this data can be housed, so that it can be located by users.

8. What other platforms with open research data, at Spanish or European level, do you recommend?

9. What actions do you consider to be priorities for public institutions in order to promote open knowledge?

Interview clips

1. Why should universities and researchers share their studies in open formats?

2. What requirements must an investigation meet in order to be considered open?

Structure covering the data lifecycle

Planning

Collection and processing

Quantitative scope:

Qualitative:

Sharing and analysis

Use and evaluation

A collaborative resource in continuous improvement

Summary / Transcript of the interview

1. What is data science? Why is it important and what can it do for us?

2. How did you get started in data science and what do you do to keep up to date? What strategies would you recommend for both beginners and more experienced profiles?

3. If someone with little or no technical knowledge wants to learn data science, where do they start?

4. In addition to teaching, you both work in technology companies. What technical certifications are most valued in the business sector and what open sources of knowledge do you recommend to prepare for them?

6. In closing, we'd like you to offer a piece of advice that is more attitude-oriented than technical, what would you say to someone starting out in data science?

Interview clips

1. Is it worth studying data science?

2. How are the data science exercises on datos.gob.es approached?

3. What is data science? What skills are required?

European regulatory framework for a more open and transparent AI.

The OpenEuroLLM project

The OpenEuroLLM Consortium: a collaborative and multinational project

Linguistic diversity, transparency and compliance with EU standards

Democratising access to AI

Professional competencies for working with data

INAP''s continuing professional development training offer

Other training references

Maniac by Benjamin Labatut.

Take control of your data, by Alicia Asin.

Governance, management and quality of artificial intelligence by Mario Geraldo Piattini.

Nexus, by Yuval Noah.

Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play by David Foster (second edition 2024)

Information is beautiful, by David McCandless.

Collecting Field Data with QGIS and Mergin Maps, de Kurt Menke y Alexandra Bucha Rasova.

A terrible greenery by Benjamin Labatut.

Designing Better Maps: A Guide for GIS Users, de Cynthia A. Brewer.

AtmOOs Academic Project: Education and citizen science on air pollution and mobility.

WildINTEL: Research project on life monitoring in Huelva

IncluScience-Me: Citizen science in the classroom to promote scientific culture and biodiversity conservation.

The Memory of the Herd: Documentary corpus of pastoral life.

Plastic Pirates Spain: Study of plastic pollution in European rivers.

The context of digital rights

The context of artificial intelligence

The importance of digital literacy in both contexts

Recommendations for implementing a preparedness partnership