When we think of open data our first intuition is usually directed towards data generated by public sector bodies in the exercise of their functions and made available for reuse by citizens and businesses, i.e. public sector open data or open public data. This is natural, because public sector information represents an extraordinary source of data and the intelligent use of this data, including its processing through artificial intelligence applications, has great transformative potential in all sectors of the economy, as recognised by the European directive on open data and re-use of public sector information.
One of the most interesting novelties introduced by the directive was the initial but expandable definition of 6 thematic categories of high-value datasets, whose re-use is associated with considerable benefits for society, the environment and the economy. These six areas - Geospatial, Earth Observation and Environment, Meteorology, Statistics, Societies and Corporate Ownership and Mobility - are the ones that in 2019 were considered to have the greatest potential for the creation of value-added services and applications based on such datasets. However, looking ahead to 2021, which is almost a year into the global health crisis, it seems clear that this list misses two key areas with a high potential impact on society, namely health and education.
Indeed, we find that on the one hand, educational institutions are explicitly exempted from some obligations in the directive, and on the other hand, health sector data are hardly mentioned at all. The directive, therefore, does not provide for a development of these two areas that the circumstances of the covid-19 pandemic have brought to the forefront of society's priorities.
The availability of health and education data
Although health systems, both public and private, generate and store an enormous amount of valuable data in people's medical records, the availability of these data is very limited due to the very high complexity of processing them in a secure way. Health-related datasets are usually only available to the entity that generates them, despite the great value that their release could have for the advancement of scientific research.
The same could be said for data generated by student interaction with educational platforms, which is also generally not available as open data. As in the health sector, these datasets are usually only available to their owners, for whom they are a valuable asset for the improvement of the platforms, which is only a small part of their potential value to society.
The directive states that high-value data should be published in open formats that can be freely used, re-used and shared by anyone for any purpose. Furthermore, in order to ensure maximum impact and facilitate re-use, high-value datasets should be made available for re-use with very few legal restrictions and at no cost.
Health data are highly sensitive to the privacy of individuals, so the delicate trade-off between respect for privacy and the need to support the advancement of scientific research must always be kept in mind. The consideration of health and education data as high-value open data should probably maintain some particular restrictions due to the nature and sensitivity of these data and promote figures such as the donation of data for research purposes by patients or the exchange for the same purpose between researchers. In this sense, the 2018 regulation on data protection introduced the possibility of reusing data for research purposes, provided that the appropriate pseudonymisation measures and the rest of the legally stipulated guarantees are adopted.
The importance of public-private partnerships
Education and health are two areas where the private sector or public-private partnerships are making exciting strides in converting some of the potential of open data into benefits for society. Open data publishing is not the exclusive preserve of the public sector and there is a long tradition of private-public collaboration, largely channelled through universities. Let us look at some examples:
- There are a number of initiatives such as the pioneering The UCI Machine Learning Repository founded in 1987 as a repository of datasets used by the artificial intelligence community for empirical analysis of machine learning algorithms. This repository has been cited more than 1000 times, the highest number of citations obtained in the computer science domain. In this and other repositories also managed by universities or foundations with donations from private companies, we can also find open datasets released by companies or in which they have actively collaborated in their creation or development.
- Also large technology companies, no doubt inspired by these initiatives, maintain open data search engines or repositories such as Google's dataset search engine, AWS's open data registry, or Microsoft Azure's datasets, where datasets related to health or education are increasingly common.
- In terms of data that can contribute to improving education, for example, The Open University publishes OULAD (OpenUniversity Learning Analytics Dataset), an open learning analytics dataset containing data on courses, students and their interactions with the virtual learning environment for seven courses. However, there are very few comparable datasets whose joint use in projects would undoubtedly allow further progress to be made in areas such as detecting the risk of students dropping out.
- As far as the health sector is concerned, it is worth highlighting the case of the Spanish platform HealthData 29, developed by Fundación 29, which aims to create the necessary infrastructure to make it possible to securely publish open health datasets so that they are available to the community for research purposes. As part of this infrastructure, Foundation 29 has published the Health Data Playbook, which is a guide for the creation, within the current technical and legal framework, of a public repository of data from health systems, so that they can be used in medical research. Microsoft has collaborated in the preparation of this guide as a technological partner and Garrigues as a legal partner, and it is aimed at organisations that carry out health research.
At the moment the platform only has available the Covid Data Save Lives (COVIDDSL) dataset published by the HM Hospitales University Hospital Group, composed of clinical data on interactions recorded in the covid-19 treatment process. However, it is an excellent example of the potential that we may be missing out on globally by not collecting and publishing more and better data on patients diagnosed with covid-19 in a systematised way and on a global scale. The creation of predictive models of disease progression in patients, the development of epidemiological models on the spread of the virus, or the extraction of knowledge on the behaviour of the virus for vaccine development are just some of the use cases that would benefit from greater availability of this data.
Education and health are two of the great concerns of all developed societies in the world because they are closely related to the well-being of their citizens. But perhaps we have never been more aware of this than in the last year and this represents an extraordinary opportunity to drive initiatives that contribute to unlocking more open health and education data. Whether as high-value data or in any other form, these datasets are key to enabling us to better react to future health crisis situations but also to help us overcome the aftermath of the current one.
Content prepared by Jose Luis Marín, Senior Consultant in Data, Strategy, Innovation & Digitalization.
The contents and points of view reflected in this publication are the sole responsibility of its author.
QMENTA is an advanced medical image storage, processing and visualization company focused on brain data analysis, specifically using MRI and related clinical data.
It provides state-of-the-art medical image processing algorithms transparently to accelerate the development of new therapies for neurological diseases through a scalable and collaborative cloud platform.
Spain is the second country with the highest life expectancy in the world, only behind Japan. Spaniards live 83 years on average. This positive data is stained by a negative one: the low birth rate, which is leading us towards a population aging. This situation means that we need a more efficient health system to continue providing quality health services to citizens.
As in other sectors, the improvement of efficiency goes through the necessary digital transformation, in which data in general - and open data in particular - have a leading role. Open data can help us better understand the situation of patients and, together with technologies such as big data or artificial intelligence systems, facilitate early detection of diseases. In short, they can help improve both the management and the provision of services.
But in an area where patient privacy is essential, we have a series of doubts: What types of data can be opened? What does the legislation say about it?
The report "Open data and health: technological context, stakeholders and legal framework", prepared by Julián Valero, tries to shed some light on this situation. For this, the following objectives are set:
- Knowing the conditions, limitations and restrictions imposed by current legal regulations.
- Posing how the guarantees offered by the Law should be adapted to a new reality based on technological innovation.
The report begins by showing the current situation of the Spanish health system, gathering the challenges to be faced, but also the opportunities that come hand in hand with new technological trends, such as Internet of Things or the aforementioned Artificial Intelligence.
Once the context has been explained, the report focuses on the different stakeholders involved in the provision of health services, both public and private, and the main laws and regulations that affect each group. The novelties of the General Data Protection Regulation (GDPR) and its impact on the opening of health data are also addressed.

The report ends with a series of conclusions and recommendations to promote public policies in the field of health that drive improvements in the provision of health services.
You can download the full report below.
Health is one of the priority development fields in this century. Most analysts agree that health management - from all possible perspectives - will change radically in the coming years. The analysis of health data will set the way forward in the coming days.
The life expectancy of developed countries increases as the century advances. In the last twenty years, the life expectancy of many developed countries has overcome the barrier of 80 years on average. Japan, Spain, Switzerland, Singapore, among others, are already above 83 years of life expectancy and the trend continues with a constant growth rate.

Figure 1. Life expectancy in years according to CIA World Factbook 2013.
Take this introduction on life expectancy to motivate the central theme of this article. As we get older, the diseases that affect us evolve. A longer life expectancy does not necessarily mean a better quality of life in the adult and old age. To live longer, it is necessary to develop better health care. Modern societies need to make a successful transition from treatment to prevention. That is: Prevent rather than cure.
But, to improve prevention, we need to know better the risks and anticipate future complications. The analysis of the data related to our health is vital to face this transition. There are many tasks and necessary actions before going on to establish continuous health data analysis strategies.
By nature, data related to health are sensitive data. Personal health data have a direct impact on our work and personal relationships and can have a very noticeable impact on our economy -on both personal and society level-. Among others, The challenges of health data analysis are:
- Generation of public datasets.
- Standard mechanisms for health data anonymization.
- In real time Health data collection tools.
- Health data models agreed by the scientific community.
- Health data analysis tools prepared for high data volumes.
- Specialist profiles, both health experts and data scientists specialized in this field (semi-structured data and semantic technologies).
Digital transformation of health sector
The digital transformation of health sector represents one of the greatest challenges for public and private health institutions and systems. Many hospital in developed countries have begun to digitize some of the most important data related to our health. Especially, those data recorded in the face-to-face visits to doctors. The Electronic Health Records (EHR) and the diagnostic tests (for example, medical image or clinical analysis) are the registers with a higher digitalization degree. While it is true that the digitization degree of these examples can be high, the way they have been considered is different by countries and systems. Transforming historical and analogical clinical records into digital information adds very little value compared to the effort and investment needed. However, tackling the digitalization of clinical records with the focus on the subsequent intelligent analysis of the data can mean a revolution with an incalculable impact. For example, the implementation of ontologies specially designed for the medical domain such as SNOMED-CT radically changes the future exploitation of medical data and enables a superior intelligence layer supported by the future Artificial Intelligence as an assistant to the doctors and nurses of the future.
Some public repositories
There are different repositories where you can find open data sets related to health. Most available data are statistics related to health indicators. However, there are more specialized repositories where it is possible to find data sets to perform advanced data analysis.
For example, the health systems of the United States and the United Kingdom, respectively, publish their health data in the following repositories:
Other multi-country organizations, such as the World Health Organization (WHO) or UNICEF NGO also have open data repositories:
- UNICEF offers statistics on the situation of women and children worldwide.
- World Health Organization offers world hunger, health, and disease statistics.
Beyond statistical data, Kaggle, a specialized data science website, regularly holds open competitions for teams to solve data-based challenges. For example, in one of the Kaggle competitions, the challenge was to predict hospital readmissions for diabetes. To solve the challenge, they offer a data set (duly anonymized) composed of 65 records of patients with diabetes and 50 fields that include information on: gender, age, weight, etc.

Figure 2. Excerpt from the data set available for the diabetes challenge.
In summary, the systematic analysis of health data opens the doors to predictive medicine. To enable technologies that assist health professionals of the future, it is necessary to build sustainable, scalable and long-lasting data strategies. Collecting, storing, modeling and analyzing health data is the key to a future where healthcare is something more than a mere contact with patients.
Content prepared by Alejandro Alija, expert in Digital Transformation and innovation.
Contents and points of view expressed in this publication are the exclusive responsibility of its author.
On Friday, April 21, a workshop dedicated to data science in the social and health sector will take place in Media-Lab Prado, Madrid. A meeting designed for professionals and researchers specialized in the data analysis for social purposes and belonging to the health science.
The event will start with a specific session on the analysis of urban mobility through big data, followed by two talks related to the healthcare sector under the titles "Big4Cast: prediction of crisis in bipolar disorder" and "machine learning in EGG predictive analysis ".
As a clousure of the day, the attendees will be able to learn about the work of other experts in the field through the poster exhibition that will take place during the workshop. Those professionals who have sent their pieces of work to vlopezlo@ucm.es will obtain the corresponding certificate. Afterwards, a round table will be held where five representatives of public and private entities will discuss on the following topics:
-
Social development with maps (ESRI España).
-
Healthcare research (Fujitsu).
-
Madrid Salud (WAP).
-
Innovation in the cloud (AWS).
-
Open Data (City of Madrid).
To attend at the workshop, participants need to register before through the Eventbrite webpage.
As healthcare systems worldwide become increasingly digitized, medical researchers and experts have more data than ever. Such information, open and accessible, provides major opportunities not only for national health policies but for diagnoses and tailored treatments known as stratified medicine.
Nowadays, two approaches emerged for health data storing, ownership, usage, and responsibility. First, we deal with information guarded by public, academic and civic entities -medical expense, budgets, epidemiological tendencies, scientific results…- and, on the other hand, private user-generated data collected from wearable devices, social media and online searches. Regardless of their origin, opening up and reusing data impact positively in the whole of society.
Aware of the potential of this resource, the Ministerio de Salud Pública from Uruguay has launched the initiative A tu servicio, through which citizens have access to open data on national healthcare institutions. Thus, users can filter and compare the medical centres according to different indicators in order to choose the institution that best adapts to their needs.
Apart from boosting the citizen participation and improving decision making, open data may become the key factor to control diseases. This is the case of a project carried out by the government of Singapore since 2005, in which the National Environment Agency shares informations about dengue in order to prevent its expansion; publishing the active cluster areas in the republic.
This campaign has been designed with a two-fold objective: to promote residents’ awareness on this health problem and alert the community about the seriousness of this disease. Moreover, data have been opened to let infomediary sector reuse them and create solutions and services which help fight dengue.
Nevertheless, despite the great benefits derived from health data reuse, it simultaneously implies new challenges; in particular, the storing, real-time treatment and integration of the existing large volume of datasets. While it is also true that, thanks to technological advances and cloud-computing services, it is increasingly easier to overcome those obstacles. After all, information is no longer stored in an unique place, enabling the different agents to have access to data and facilitating its re-use.
This is the example of international INDEPTH network - formed by different research centres in Africa and Asia- which collects health and demographic data from developing countries in order to improve the health surveillance systems as many nations lack the necessary policies, communities and infrastructures to guard and open up those data. So, though the coordination of the different members, the information is centralised on a open data catalogue for users, after registration, to freely use them.
At European level, UK leads the opening of health data since, in 2014, the Public Health Service (NHS) launched its own open data portal that allows citizens to compare health services in the country and make decisions based on public information. Such is the impact of open data that, according to NHS National Director, since data about cardiac surgery have been published, mortality rate has been reduced by one third in this type of surgical procedures in the country. Thus, studies based on data analysis have allowed researchers to study the impact of new medicines, improving the quality of life of population.
Today more than ever it is clear the value of open data in global health sector; however, it is still necessary that public bodies, infomediary sector, civic organizations and citizens join synergies for further progress in the exploitation of open data. After all, it is essential to have a common consciousness to make open culture, regardless of the type of information, go deeper in each socioeconomic level worldwide.
Thanks to the reuse of public sector information, new businesses and new business models are beginning to emerge. But, for that, public entities have to open the data they store and place them at the service of the society. According to the Characterization Study of the Infomediary Sector, in 2014 the financial turnover of infomediary businesses in Spain was from 450 to 500 million euros and the sector provided for approximately 4200 to 4700 jobs. In fact, the economic potential of open data and big data in Europe is 200000 million euros and in USA seven sectors alone could generate more than $3 trillion a year in additional value as a result of open data.
The economic value of the open data is already a reality. It has become the raw material for agriculture, nutrition or health sectors; boosting employment, providing services tailored to the society needs and developing innovative solutions. Open data is so important that, in the 4th International Open Data Conference, a panel was dedicated to the business models based on open data. The conference was the appropriate event to present the Open Data Impact Map. This initiative is a searchable, centralized database of open data cases from all around the world. Still in beta, this map will be the Open Data 500 successor.
Over the last few years the health sector has focused on open data to reduce costs, increase revenues, save time and improve medical diagnosis. Mastondon C is an excellent example. This startup has been working with Open Healthcare UK on NHS prescriptions; reusing PSI to save millions of pounds on medical prescriptions.
The scientific community is also aware of the importance of open data; not only to develope news product and services but to be sure about the integrity of any investigation or health advance. After all, if we have access to the data, we can evaluate the effectiveness of treatments and the fairness of medial studies.
Thanks to the data re-use, Google created the web service Google Dengue Trends; a near real-time tool which worked by capturing disease-related queries typed into Google and displaying a map indicating dengue activity. Though the map is disabled, the historic estimates produced by Google Dengue Trends are available to being reused by any user.
By information sharing and extracting data, technological advances has made them much easier. Agricultural and environmental sectors have made use of leading-edge technology to create new business opportunities. The UK government has unleashed 1000 farming datasets to boost the national farming productivity and help business and consumers decision-making. The data will be used to improve the quality of crop yields and deal with disease outbreaks. This is what Plantwise offers. This programme is a global resource to increase food security and improve rural livelihoods by reducing crop losses.
Together with GODAN (Global Open Data for Agriculture and Nutrition), Open Data Institute has published a discussion paper where they show how open data is a powerful tool being used to solve problems in agriculture such as drought, pests or food security.
Another noteworthy initiative is Foodie Project: an European platform hub on the cloud where data related to agricultural and farming sector are stored from open data portals and sensors located in crops and farms. This infrastructure integrates different datasets and provides high-value applications for the support of farmers and stockmen.
Analysis tools, food products, eHealth apps… There are as many business opportunities based on open data as datasets. The key issue is to identify the needs and create solutions that transform the public or private information into added value services.
Tags: open data, agriculture sector, health sector, Foodie, GODAN, ODI, Plantwise, Google Dengue Trends, Mastodon C, infomediary sector