Noticia

Effective equality between men and women is a common goal to be achieved as a society. This is stated by the United Nations (UN), which includes "Achieve gender equality and empower all women and girls" as one of the Sustainable Development Goals to be achieved by 2030.

For this, it is essential to have quality data that show us the reality and the situations of risk and vulnerability that women face. This is the only way to design effective policies that are more equitable and informed, in areas such as violence against women or the fight to break glass ceilings. This has led to an increasing number of organisations opening up data related to gender inequality. However, according to the UN itself, less than half of the data needed to monitor gender inequality is currently available.

What data are needed?

In order to understand the real situation of women and girls in the world, it is necessary to systematically include a gender analysis in all stages of the production of statistics. This implies from using gender-sensitive concepts to broadening the sources of information in order to highlight phenomena that are currently not being measured.

Gender data does not only refer to sex-disaggregated data. Data also need to be based on concepts and definitions that adequately reflect the diversity of women and men, capturing all aspects of their lives and especially those areas that are most susceptible to inequalities. In addition, data collection methods need to take into account stereotypes and social and cultural factors that may induce gender bias in the data.

Resources for gender mainstreaming in data

From datos.gob.es we have already addressed this issue in other contents, providing some initial clues on the creation of datasets with a gender perspective, but more and more organisations are becoming involved in this area, producing materials that can help to alleviate this issue.

The UN Statistics Division produced the report Integrating a Gender Perspective into Statistics to provide the methodological and analytical information needed to improve the availability, quality and use of gender statistics.  The report focuses on 10 themes: education; work; poverty; environment; food security; power and decision-making; population, households and families; health; migration, displaced persons and refugees; and violence against women. For each theme, the report details the gender issues to be addressed, the data needed to address them, data sources to be considered, and specific conceptual and measurement issues. The report also discusses in a cross-cutting manner how to generate surveys, conduct data analysis or generate appropriate visualisations.

UN agencies are also working on this issue in their various areas of action. For example, Unicef has also developed guides of interest such as “Gender statistics and administrative data systems”, which compiles resources such as conceptual and strategic frameworks, practical tools and use cases, among others.

Another example is the World Bank. This organisation has a gender-sensitive data portal, where it offers indicators and statistics on various aspects such as health, education, violence or employment. The data can be downloaded in CSV or Excel, but it is also displayed through narratives and visualisations, which make it easier to understand. In addition, they can be accessed through an API.  This portal also includes a section where tools and guidelines are compiled to improve data collection, use and dissemination of gender statistics. These materials are focused on specific sectors, such as agri-food or domestic work. It also has a section on courses, where we can find, among others, training on how to communicate and use gender statistics.

Initiatives in Spain

If we focus on our country, we also find very interesting initiatives. We have already talked about GenderDataLab.org, a repository of open data with a gender perspective. Its website also includes guides on how to generate and share these datasets. If you want to know more about this project, we invite you to watch this interview with Thais Ruiz de Alda, founder and CEO of Digital Fems, one of the entities behind this initiative.

In addition, an increasing number of agencies are implementing mechanisms to publish gender-sensitive datasets. The Government of the Canary Islands has created the web tool “Canary Islands in perspective” to bring together different statistical sources and provide a scorecard with data disaggregated by sex, which is continuously updated. Another project worth mentioning is the “Women and Men in the Canary Islands” website, the result of a statistical operation designed by the Canary Islands Statistics Institute (ISTAC) in collaboration with the Canary Islands Institute for Equality. It compiles information from different statistical operations and analyses it from a gender perspective.

The Government of Catalonia has also included this issue in its Government Plan. In the report "Prioritisation of open data relating to gender inequality for the Government of Catalonia" they compile bibliography and local and international experiences that can serve as inspiration for both the publication and use of this type of datasets. The report also proposes a series of indicators to be taken into account and details some datasets that need to be opened up.

These are just a few examples that show the commitment of civil associations and public bodies in this area. A field we must continue to work in order to get the necessary data to be able to assess the real situation of women in the world and thus design political solutions that will enable a fairer world for all.

calendar icon
Blog

On 24 February Europe entered a scenario that not even the data could have predicted: Russia invaded Ukraine, unleashing the first war on European soil so far in the 21st century.

Almost five months later, on 26 September, the United Nations (UN) published its official figures: 4,889 dead and 6,263 wounded. According to the official UN data, month after month, the reality of the Ukrainian victims was as follows:

Date Deceased Injured
24-28 February 336 461
March 3028 2384
April 660 1253
May 453 1012
Jun 361 1029
1-3 july 51 124

 

According to data extracted by the mission that the UN High Commissioner for Human Rights has been carrying out in Ukraine since Russia invaded Crimea in 2014, the total number of civilians displaced as a result of the conflict is more than 7 million people.

However, as in other areas, the data serve not only to develop solutions, but also to gain an in-depth understanding of aspects of reality that would otherwise not be possible. In the case of the war in Ukraine, the collection, monitoring and analysis of data on the territory allows organisations such as the United Nations to draw their own conclusions.

With the aim of making visible how data can be used to achieve peace, we will now analyse the role of data in relation to the following tasks:

Prediction

In this area, data are used to try to anticipate situations and plan an appropriate response to the anticipated risk. Whereas before the outbreak of war, data was used to assess the risk of future conflict, it is now being used to establish control and anticipate escalation.

For example, satellite images provided by applications such as Google Maps have made it possible to monitor the advance of Russian troops. Similarly, visualisers such as Subnational Surge Tracker identify peaks of violence at different administrative levels: states, provinces or municipalities.

Information

It is just as important to know the facts in order to prevent violence as it is to use them to limit misinformation and communicate the facts objectively, truthfully and in line with official figures. To achieve this, fact-checking applications have begun to be used, capable of responding to fake news with official data.

Among them is Newsguard, a verification entity that has developed a tracker that gathers all the websites that share disinformation about the conflict, placing special emphasis on the most popular false narratives circulating on the web. It even catalogues this type of content according to the language in which it is promoted.

Material damage

The data can also be used to locate material damage and track the occurrence of new damage. Over the past months, the Russian offensive has damaged the Ukrainian public infrastructure network, rendering roads, bridges, water and electricity supplies, and even hospitals unusable.

Data on this reality is very useful for organising a response aimed at reconstructing these areas and sending humanitarian assistance to civilians who have been left without services.

In this sense, we highlight the following use cases:

  • The United Nations Development Programme''s (UNDP) machine learning algorithm has been developed and improved to identify and classify war-damaged infrastructure.
  • In parallel, the HALO Trust uses social media mining capable of capturing information from social media, satellite imagery and even geographic data to help identify areas with ''explosive remnants''. Thanks to this finding, organisations deployed across the Ukrainian terrain can move more safely to organise a coordinated humanitarian response.
  • The light information captured by NASA satellites is also being used to build a database to help identify areas of active conflict in Ukraine. As in the previous examples, this data can be used to track and send aid to where it is most needed.

Human rights violations and abuses

Unfortunately, in such conflicts, violations of the human rights of the civilian population are the order of the day. In fact, according to experience on the ground and information gathered by the UN High Commissioner for Human Rights, such violations have been documented throughout the entire period of war in Ukraine.

In order to understand what is happening to Ukrainian civilians, monitoring and human rights officers collect data, public information and first-person accounts of the war in Ukraine. From this, they develop a mosaic map that facilitates decision-making and the search for just solutions for the population.

Another very interesting work developed with open data is carried out by Conflict Observatory. Thanks to the collaboration of analysts and developers, and the use of geospatial information and artificial intelligence, it has been possible to discover and map war crimes that might otherwise remain invisible.

Migratory movements

Since the outbreak of war last February, more than 7 million Ukrainians have fled the war and thus their own country. As in previous cases, data on migration flows can be used to bolster humanitarian efforts for refugees and IDPs.

Some of the initiatives where open data contributes include the following:

The Displacement Tracking Matrix is a project developed by the International Organization for Migration and aimed at obtaining data on migration flows within Ukraine. Based on the information provided by approximately 2,000 respondents through telephone interviews, a database was created and used to ensure the effective distribution of humanitarian actions according to the needs of each area of the country

Humanitarian response  

Similar to the analysis carried out to monitor migratory movements, the data collected on the conflict also serves to design humanitarian response actions and track the aid provided.

In this line, one of the most active actors in recent months has been the United Nations Population Fund (UNFPA), which created a dataset containing updated projections by gender, age and Ukrainian region. In other words, thanks to this updated mapping of the Ukrainian population, it is much easier to think about what needs each area has in terms of medical supplies, food or even mental health support.

Another initiative that is also providing support in this area is the Ukraine Data Explorer, an open source project developed on the Humanitarian Data Exchange (HDX) platform that provides collaboratively collected information on refugees, victims and funding needs for humanitarian efforts.

Finally, the data collected and subsequently analysed by Premise provides visibility on areas with food and fuel shortages. Monitoring this information is really useful for locating the areas of the country with the least resources for people who have migrated internally and, in turn, for signalling to humanitarian organisations which areas are most in need of assistance.

Innovation and the development of tools capable of collecting data and drawing conclusions from it is undoubtedly a major step towards reducing the impact of armed conflict. Thanks to this type of forecasting and data analysis, it is possible to respond quickly and in a coordinated manner to the needs of civil society in the most affected areas, without neglecting the refugees who are displaced thousands of kilometres from their homes.

We are facing a humanitarian crisis that has generated more than 12.6 million cross-border movements. Specifically, our country has attended to more than 145,600 people since the beginning of the invasion and more than 142,190 applications for temporary protection have been granted, 35% of them to minors. These figures make Spain the fifth Member State with the highest number of favourable temporary protection decisions. Likewise, more than 63,500 displaced persons have been registered in the National Health System and with the start of the academic year, there are 30,919 displaced Ukrainian students enrolled in school, of whom 28,060 are minors..


Content prepared by the datos.gob.es team.

calendar icon
Empresa reutilizadora

Estudio Alfa is a technology company dedicated to offering services that promote the image of companies and brands on the Internet, including the development of apps. To carry out these services, they use techniques and strategies that comply with usability standards and favour positioning in search engines, thus helping their clients' websites to receive more visitors and thus potential clients. They also have special experience in the production and tourism sectors.

 

calendar icon
Noticia

The deadline for receiving applications to participate in the IV Aporta Challenge closed on 15 February. In total, 38 valid proposals were received in due time and form, all of high quality, whose aim is to promote improvements in the health and well-being of citizens through the reuse of data offered by public administrations for their reuse.

Disruptive technologies, key to extracting maximum value from data

According to the competition rules, in this first phase, participants had to present ideas that identified new opportunities to capture, analyse and use data intelligence in the development of solutions of all kinds: studies, mobile applications, services or websites.

All the ideas seek to address various challenges related to health and wellbeing, many of which have a direct impact on our healthcare system, such as improving the efficiency of services, optimising resources or boosting transparency. Some of the areas addressed by participants include pressure on the health system, diagnosis of diseases, mental health, healthy lifestyles, air quality and the impact of climate change.

Many of the participants have chosen to use disruptive technologies to address these challenges. Among the proposals, we find solutions that harness the power of algorithms to cross-reference data and determine healthy habits or predictive models that allow us to know the evolution of diseases or the situation of the health system. Some even use gamification techniques. There are also a large number of solutions aimed at bringing useful information to citizens, through maps or visualisations.

Likewise, the specific groups at which the solutions are aimed are diverse: we find tools aimed at improving the quality of life of people with disabilities, the elderly, children, individuals who live alone or who need home care, etc.

Proposals from all over Spain and with a greater presence of women

Teams and individuals from all over Spain have been encouraged to participate in the Challenge. We have representatives from 13 Autonomous Communities: Madrid, Catalonia, the Basque Country, Andalusia, Valencia, the Canary Islands, Galicia, Aragon, Extremadura, Castile and Leon, Castile-La Mancha, La Rioja and Asturias.

25% of the proposals were submitted by individuals and 75% by multidisciplinary teams made up of various members. The same distribution is found between individuals (75%) and legal entities (25%). In the latter category, we find teams from universities, organisations linked to the Public Administration and different companies.

It is worth noting that in this edition the number of women participants has increased, demonstrating the progress of our society in the field of equality. Two editions ago, 38% of the proposals were submitted by women or by teams with women members. Now that number has risen to 47.5%. While this is a significant improvement, there is still work to be done in promoting STEM subjects among women and girls in our country.

Jury deliberation begins

Once the proposals have been accepted, it is time for the jury's assessment, made up of experts in the field of innovation, data and health. The assessment will be based on a series of criteria detailed in the rules, such as the overall quality and clarity of the proposed idea, the data sources used or the expected impact of the proposed idea on improving the health and well-being of citizens.

The 10 proposals with the best evaluation will move on to phase II, and will have a minimum of two months to develop the prototype resulting from their idea. The proposals will be presented to the same jury, which will score each project individually. The three prototypes with the highest scores will be the winners and will receive a prize of 5,000, 4,000 and 3,000 euros, respectively.

Good luck to all participants!

calendar icon
Blog

Today, 8 March is the day on which we commemorate women's struggle to achieve their full participation in society, as well as giving visibility to the current gender inequality and demanding global action for effective equality of rights in all areas.

However, the data seem to indicate that we still have some way to go in this respect. 70% of the 1.3 billion people living in poverty are women. Women predominate in global food production (up to 80% in some areas), but own less than 10% of the land. Eighty per cent of people displaced by disasters and climate-related changes worldwide are women and girls. And the situation for women has only worsened due to the pandemic, causing the estimate of the time needed to close the current gender gap to now grow to more than 135 years.

The importance of data in the fight for equality

It is therefore a fact that women have fallen behind on many of the sustainable development indicators, an inequality that is also being replicated in the digital world - and even amplified through the increasing use of algorithms that lack the necessary training data to be representative of women's reality. But it is also a fact that we do not even have all the data we need to know with certainty where we stand on a large number of key indicators.

There is a widespread shortage of gender data that cuts across all economic and social sectors. The World Bank, the European Union, the OECD, the United Nations, UNICEF, the ITU or the IMF - more and more international bodies are making their own particular efforts to compile their own gender databases. However, indicators are still lacking in many key areas, in addition to other important gaps in the quality of existing data that are often incomplete or outdated.

This lack of data is something that can be particularly problematic when it comes to such sensitive issues as gender-based violence - an area where we are fortunately seeing more and more data globally, including some great and encouraging examples such as the ILDA-led femicide data initiative. This is a very important step forward because it is even more difficult to improve when we don't even know what the current situation is. Data, and the governance policies we create to manage it, can also be sexist.

Data are tools for making better decisions and better policies. They allow us to set goals and measure our progress. Data has therefore become an indispensable tool for creating social impact in communities. This is why the lack of data on the lives of women and girls is so damaging.

Addressing the gender gap through data

In seeking solutions to this problem, and thus working for gender equality also through data, it is crucial that we involve the protagonists and give them a voice. In this way, through their own experiences, we can develop more inclusive processes for data collection, analysis and publication. We will then be in a much better position to use data as an inclusive tool to address gender equality. Catherine D'Ignazio and Lauren Klein's excellent Data Feminism Handbook provides a set of strategies and principles to guide us in doing this:

  1. Examining power - Data feminism begins by looking at how power operates in the world.
  2. Challenging power - We must commit to challenging power structures when they are unequal and working for equity.
  3. Empowering emotions and embodiment - Data feminism teaches us to value multiple forms of knowledge, including that which comes from people.
  4. Rethink binarisms and hierarchies - We must challenge gender binarism, as well as other systems of quantification and classification that could lead to various forms of marginalisation.
  5. Embrace pluralism - The most complete knowledge emerges from synthesising multiple perspectives, prioritising local knowledge and experiences.
  6. Consider context - Data are neither neutral nor objective. They are products of unequal social relations, and understanding that context will be essential to ethical and accurate analysis.
  7. Make the work visible - The work of data science is the collaborative product of many people. All of this work must be made visible, so that it can be recognised and valued.

Nuestras opciones para contribuir a reducir la brecha de datos

In order to make progress in this fight for equality, we need much more gender-disaggregated data that adequately reflects the concerns of women and girls, their diversity and all aspects of their lives. We can and should all do our part in drawing attention to the disadvantages women face through data. Here are some tips:

  • Start by always collecting and publishing data disaggregated by gender.
  • Always use women as a reference group in our calculations when we are dealing with inequalities that affect them directly.
  • Document the decisions we make and our methodologies in working with gender data, including any changes in our approaches over time and their justification.
  • Always share raw and complete data in an open and reusable format. In this way, even if we have not focused on the challenges women face, at least others can do so using the same data.

Together we can make the invisible visible and finally ensure that every single woman and girl in the world is counted. The situation is urgent and now is the time to make a determined bid to close the data gap as a necessary tool to close the gender gap as well.


Content prepared by Carlos Iglesias, Open data Researcher and consultant, World Wide Web Foundation. The contents and views expressed in this publication are the sole responsibility of the author.

calendar icon
Blog

Today, no one can deny that open data holds great economic power. The European Commission itself estimates that the turnover of open data in the EU27 could reach 334.2 billion in 2025, driven by its use in areas linked to disruptive technologies such as artificial intelligence, machine learning or language technologies.

But in addition to its economic impact, open data also has an important value for society: it provides information that makes social reality visible, driving informed decision-making for the common good.

There are thousands of areas where open data is essential, from refugee crises to the inclusion of people with disabilities, but in this article we will focus on the scourge of gender violence.

Where can I obtain data on the subject?

Globally, agencies such as the UN, the World Health Organization and the World Bank offer resources and statistics related to violence against women.

In our country, local, autonomous and state agencies publish related datasets. To facilitate unified access to them, the Government Delegation against Gender Violence has a statistical portal that includes in a single space data from various sources such as the Ministry of Finance and Public Administration, the General Council of the Judiciary or the Public Employment Service of the Ministry of Employment and Social Security. The user can cross-reference variables and create tables and graphs to facilitate the visualization of the information, as well as export the data sets in CSV or Excel format. 

Projects to raise awareness and visibility

But data alone can be complicated to understand. Data need a context that gives them meaning and transforms them into information and knowledge. This is where different projects arise that seek to bring data to the public in a simple way.

There are many associations and organizations that take advantage of published data to create visualizations and stories with data that help to raise awareness about gender violence. As an example, the Barcelona Open Data Initiative is developing the "DatosXViolenciaXMujeres" project. It is a visual and interactive tour on the impact of gender violence in Spain and by Autonomous Communities during the period 2008-2020, although it is updated periodically. Using data storytelling techniques, it shows the evolution of gender violence within the couple, the judicial response (orders issued and final convictions), the public resources allocated, the impact of COVID-19 in this area and crimes of sexual violence. Each graph includes links to the original source and to places where the data can be downloaded so that they can be reused in other projects.

Another example is "Datos contra el ruido” (Data against noise), developed within the framework of GenderDataLab, a collaborative platform for the digital common good that has the support of various associations, such as Pyladies or Canodron, and the Barcelona City Council, among others. This association promotes the inclusion of the gender perspective in the collection of open data through various projects such as the aforementioned "Dotos contra el ruido", which makes visible and understandable the information published by the judicial system and the police on gender violence. Through data and visualizations, it provides information on the types of crimes or their geographical distribution throughout our country, among other issues. As with "DatosXViolenciaXMujeres", a link to the original source of the data and download spaces are included.

Tools and solutions to support victims

But in addition to providing visibility, open data can also give us information on the resources dedicated to helping victims, as we saw in some of the previous projects. Making this information available to victims in a quick and easy way is essential. Maps showing the location of help centers are of great help, such as this one from the SOL.NET project, with information on organizations that offer support and care services for victims of gender-based violence in Spain. Or this one with the centers and social services of the Valencian Community aimed at disadvantaged groups, including victims of gender violence, prepared by the public institution itself.

This information is also incorporated in applications aimed at victims, such as Anticípate. This app not only provides information and resources to women in vulnerable situations, but also has an emergency call button and allows access to legal, psychological or even self-defense advice, facilitating access to a social criminologist.

In short, we are facing a particularly sensitive issue, which we must continue to raise awareness and fight to put an end to. A task to which open data can make a significant contribution.

If you know of any other example that shows the power of open data in this field, we encourage you to share it in the comments section or send us an email to dinamizacion@datos.gob.es.


Content prepared by the datos.gob.es team.

calendar icon
Noticia

15 personalities from the field of innovation, data and health will be in charge of evaluating the proposals received at the IV edition of the Aporta Challenge, the competition that seeks to reward ideas and prototypes that promote improvements in a specific sector -in this case health and well-being- through the use of open data.

The names of the members of the jury have become known through a resolution published in the Red.es electronic headquarters. Among them we find representatives of the Public Administrations, organizations linked to the digital economy and the field of the university and data communities. Do you want to know who they are?

Organizations linked to digital advancement

The jury includes a series of representatives of public organizations at the national and regional level focused on the digitization and digital transformation of our country.

- Alberto Palomo Lozano, Chief Data Officer of the Data Office, dependent on the Secretary of State for Digitalization and Artificial Intelligence of the Ministry of Economic Affairs and Digital Transformation (MINECO). Among its functions is the promotion of the sharing, management and use of data throughout all productive sectors.

- Miguel Valle del Olmo, Deputy Director General of Artificial Intelligence and Digital Enabling Technologies f the Secretary of State for Digitalisation and Artificial Intelligence (MINECO), in charge of the design and implementation of the National Artificial Intelligence Strategy of Spain.

- Santiago Graña Dominguez, Deputy Director General of Planning and Governance of the Digital Administration of MINECO. Its aboutbody in charge of promoting the process of rationalization of information and communication technologies in the scope of the General Administration of the State and its Public Bodies.

- Francisco Javier García Vieira, Director of Digital Public Services of Red.es, a public entity promoter of the Digital Agenda in Spain. The Public service area works in three areas: in education, with Educa en Digital and the Educational Posts at Home; in health, with chronicity projects in Andalusia and Extremadura and with a whole range of local and provincial developments through the Smart Territories.

- María Fernández Rancaño, Deputy Director of Digital Public Services of Red.es, unit in charge of the deployment of technological implementation programs in public services of the Administration.

- Zaida Sampedro Loan, Deputy Director General of Services to Ministries and Digital Administration of Madrid Digital, the Agency for Digital Administration of the Community of Madrid.

Entities in the field of health

Given the sectorial nature of the Challenge, representatives of organizations linked to health and well-being have been invited to form part of the jury.

- Carlos Gallego Pérez, Director of Area IA of the Tic Salut Social Foundation of the Department of Health, of the Generalitat de Catalunya. This organismpromotes the development and use of ICT in health and social welfare, functioning as an observatory of new trends and innovation. Among its projects we find initiatives to bring the health field Artificial intelligence and Emerging technologies like 5G.

- Carlos Luis Parra Calderon, Head of the Technological Innovation Section of the Virgen del Rocío University Hospital of the Andalusian Health Service. This center has a R + D + i area focused on Learning Health Systems projects, Language Technologies or Big Data for Healthcare Management, among others.

- Noemí Cívicos Villa, General Director of Digital Health and Information Systems for the National system of health of the Ministry of Health. These are organizations that encompass health benefits and services in Spain.

Business associations

The Aporta Challenge seeks to highlight the power of data as the basis for business models that drive the economy. Therefore, the representatives of business entities could not be absent from the jury.

- Antonio Cimorra Boats, Director of Digital Transformation and Enabling Technologies of Ametic (Multisectoral Association of Information Technology, Communications and Electronics Companies). This association represents companies of all sizes linked to the Spanish digital technology industry.

- Olga Quirós Bonet. Secretary General of ASEDIE (Multisectoral Information Association). ASEDIE represents infomediary companies that, from different sectors, reuse information to create value-added products and services.

- Víctor María Calvo-Sotelo Ibáñez-Martín, Managing Director of Digital (Spanish Association for Digitization), which brings together companies present throughout the digital value chain. DigitalEs is part of the Advisory Council for the Digital Transformation of the Government and is a member of the CEOE board of directors.

Universities and data communities

Students and developers are, among others, two of the target audiences of this competition, and for this reason it was also important to have the participation of data communities and universities.

- Emilio López Cano, Contracted Professor of the Higher Technical School of Computer Engineering of the Rey Juan Carlos University of Madrid. Emilio is also the President of R-Hispano, a community of users and developers whose objective is to promote the advancement of the knowledge and use of the programming language in R.

- Fernando Diaz de Maria, Professor and Head of the Multimedia Processing Group of the Higher Polytechnic School of the Carlos III University of Madrid. This entity has an attractive training offer in data, both in degrees like in postgraduate.

- Maria Sanchez Gonzalez, Associate Professor of the Department of Journalism at the University of Malaga and co-organizer of DataBeers Malaga, a non-profit initiative specialized in dynamic events related to the universe of data, including data open data.

The Secretariat of the Jury, with voice and vote, falls to Sonia Castro García-Muñoz, Coordinator of the Aporta Initiative in the Directorate of Digital Public Services of Red.es.

Jury of the IV edition of the Aporta Challenge: "The value of data for the health and well-being of citizens"

The registration closing date has been extended to February 15

In the same resolution, the closing date for submitting proposals has also been extended to February 15, 2022 at 1:00 p.m. Those citizens who wish to participate in the Challenge must present before that date an idea for a solution that promotes improvements in the field of health and well-being, using at least one set of data generated by Public Administrations, whether national or international

All the available information is published, together with the bases, in the section Aporta Challenge.

calendar icon
Documentación

1. Introduction 

Data visualization is a task linked to data analysis that aims to represent graphically the underlying information. Visualizations play a fundamental role in data communication, since they allow to draw conclusions in a visual and understandable way, also allowing detection of patterns, trends, anomalous data or projection of predictions, among many other functions. This makes its application transversal to any process that involves data. The visualization possibilities are very broad, from basic representations such as line, bar or sector graph, to complex visualizations configured on interactive dashboards. 

Before starting to build an effective visualization, a prior data treatment must be performed, paying attention to their collection and validation of their content, ensuring that they are free of errors and in an adequate and consistent format for processing. The previous data treatment is essential to carry out any task related to data analysis and realization of effective visualizations. 

We will periodically present a series of practical exercises on open data visualizations that are available on the portal datos.gob.es and in other similar catalogues. In there, we approach and describe in a simple way the necessary steps to obtain data, perform transformations and analysis that are relevant to creation of interactive visualizations from which we may extract all the possible information summarised in final conclusions. In each of these practical exercises we will use simple code developments which will be conveniently documented, relying on free tools. Created material will be available to reuse in Data Lab on Github. 

Visualización

Captura del vídeo que muestra la interacción con el dashboard de la caracterización de la demanda de empleo y la contratación registrada en España disponible al final de este artículo

2. Objetives

The main objective of this post is to create an interactive visualization using open data. For this purpose, we have used datasets containing relevant information on evolution of employment demand in Spain over the last years. Based on these data, we have determined a profile that represents employment demand in our country, specifically investigating how does gender gap affects a group and impact of variables such as age, unemployment benefits or region.

3. Resources

3.1. Datasets

For this analysis we have selected datasets published by the Public State Employment Service (SEPE), coordinated by the Ministry of Labour and Social Economy, which collects time series data with distinct breakdowns that facilitate the analysis of the qualities of job seekers. These data are available on datos.gob.es, with the following characteristics:

3.2. Tools.

R (versión 4.0.3) and RStudio  with RMarkdown add-on have been used to carry out this analysis (working environment, programming and drafting). 

RStudio is an integrated open source development environment for R programming language, dedicated to statistical analysis and graphs creation.

RMarkdown allows creation of reports integrating text, code and dynamic results into a single document.

To create interactive graphs, we have used Kibana tool.

Kibana is an open code application that forms a part of  Elastic Stack (Elasticsearch, Beats, Logstasg y Kibana) qwhich provides visualization and exploration capacities of the data indexed on the analytics engine Elasticsearch. The main advantages of this tool are:  

  • Presents visual information through interactive and customisable dashboards using time intervals, filters faceted by range, geospatial coverage, among others
  • Contains development tools catalogue (Dev Tools) to interact with data stored in Elasticsearch.
  • It has a free version ready to use on your own computer and enterprise version that is developed in the Elastic cloud and other cloud infrastructures, such as Amazon Web Service (AWS).  

On Elastic website you may find user manuals for the download and installation of the tool, but also how to create graphs, dashboards, etc. Furthermore, it offers short videos on the youtube channel and organizes webinars dedicated to explanation of diverse aspects related to Elastic Stack. 

If you want to learn more about these and other tools which may help you with data processing, see the report “Data processing and visualization tools” that has been recently updated. 

4. Data processing

To create a visualization, it´s necessary to prepare the data properly by performing a series of tasks that include pre-processing and exploratory data analysis (EDA), to understand better the data that we are dealing with. The objective is to identify data characteristics and detect possible anomalies or errors that could affect the quality of results. Data pre-processing is essential to ensure the consistency and effectiveness of analysis or visualizations that are created afterwards. 

In order to support learning of readers who are not specialised in programming, the R code included below, which can be accessed by clicking on “Code” button, is not designed to be efficient but rather to be easy to understand. Therefore, it´s probable that the readers more advanced in this programming language may consider to code some of the functionalities in an alternative way. A reader will be able to reproduce this analysis if desired, as the source code is available on the datos.gob.es Github account. The way to provide the code is through a RMarkdown document. Once it´s loaded to the development environment, it may be easily run or modified. 

4.1. Installation and import of libraries

R base package, which is always available when RStudio console is open, includes a wide set of functionalities to import data from external sources, carry out statistical analysis and obtain graphic representations. However, there are many tasks for which it´s required to resort to additional packages, incorporating functions and objects defined in them into the working environment. Some of them are already available in the system, but others should be downloaded and installed. 

#Instalación de paquetes \r\n #El paquete dplyr presenta una colección de funciones para realizar de manera sencilla operaciones de manipulación de datos \r\n if  (!requireNamespace(\"dplyr\", quietly = TRUE)) {install.packages(\"dplyr\")}\r\n #El paquete lubridate para el manejo de variables tipo fecha \r\n if  (!requireNamespace(\"lubridate\", quietly = TRUE)) {install.packages(\"lubridate\")}\r\n#Carga de paquetes en el entorno de desarrollo \r\nlibrary (dplyr)\r\nlibrary (lubridate)\r\n

4.2. Data import and cleansing

a. Import of datasets

Data which will be used for visualization are divided by annualities in the .CSV and .XLS files. All the files of interest should be imported to the development environment. To make this post easier to understand, the following code shows the upload of a single .CSV file into a data table. 

To speed up the loading process in the development environment, it´s necessary to download the datasets required for this visualization to the working directory. The datasets are available on the datos.gob.es Github account

#Carga del datasets de demandantes de empleo por municipio de 2020. \r\n Demandantes_empleo_2020 <- \r\n  read.csv(\"Conjuntos de datos/Demandantes de empleo por Municipio/Dtes_empleo_por_municipios_2020_csv.csv\",\r\n          sep=\";\", skip = 1, header = T)\r\n

Once all the datasets are uploaded as data tables in the development environment, they need to be merged in order to obtain a single dataset that includes all the years of the time series, for each of the characteristics related to job seekers that will be analysed: number of job seekers, unemployment expenditure and new contracts registered by SEPE. 

#Dataset de demandantes de empleo\r\nDatos_desempleo <- rbind(Demandantes_empleo_2006, Demandantes_empleo_2007, Demandantes_empleo_2008, Demandantes_empleo_2009, \r\n                            Demandantes_empleo_2010, Demandantes_empleo_2011,Demandantes_empleo_2012, Demandantes_empleo_2013,\r\n                            Demandantes_empleo_2014, Demandantes_empleo_2015, Demandantes_empleo_2016, Demandantes_empleo_2017, \r\n                            Demandantes_empleo_2018, Demandantes_empleo_2019, Demandantes_empleo_2020) \r\n#Dataset de gasto en prestaciones por desempleo\r\ngasto_desempleo <- rbind(gasto_2010, gasto_2011, gasto_2012, gasto_2013, gasto_2014, gasto_2015, gasto_2016, gasto_2017, gasto_2018, gasto_2019, gasto_2020)\r\n#Dataset de nuevos contratos a demandantes de empleo\r\nContratos <- rbind(Contratos_2006, Contratos_2007, Contratos_2008, Contratos_2009,Contratos_2010, Contratos_2011, Contratos_2012, Contratos_2013, \r\n                      Contratos_2014, Contratos_2015, Contratos_2016, Contratos_2017, Contratos_2018, Contratos_2019, Contratos_2020)

b. Selection of variables

Once the tables with three time series are obtained (number of job seekers, unemployment expenditure and new registered contracts), the variables of interest will be extracted and included in a new table. 

First, the tables with job seekers (“unemployment_data”) and new registered contracts (“contracts”) should be added by province, to facilitate the visualization. They should match the breakdown by province of the unemployment benefits expenditure table (“unemployment_expentidure”). In this step, only the variables of interest will be selected from the three datasets. 

#Realizamos un group by al dataset de \"datos_desempleo\", agruparemos las variables numéricas que nos interesen, en función de varias variables categóricas\r\nDtes_empleo_provincia <- Datos_desempleo %>% \r\n  group_by(Código.mes, Comunidad.Autónoma, Provincia) %>%\r\n  summarise(total.Dtes.Empleo = (sum(total.Dtes.Empleo)), Dtes.hombre.25 = (sum(Dtes.Empleo.hombre.edad...25)), \r\n            Dtes.hombre.25.45 = (sum(Dtes.Empleo.hombre.edad.25..45)), Dtes.hombre.45 = (sum(Dtes.Empleo.hombre.edad...45)),\r\n            Dtes.mujer.25 = (sum(Dtes.Empleo.mujer.edad...25)), Dtes.mujer.25.45 = (sum(Dtes.Empleo.mujer.edad.25..45)),\r\n            Dtes.mujer.45 = (sum(Dtes.Empleo.mujer.edad...45)))\r\n#Realizamos un group by al dataset de \"contratos\", agruparemos las variables numericas que nos interesen en función de las varibles categóricas.\r\nContratos_provincia <- Contratos %>% \r\n  group_by(Código.mes, Comunidad.Autónoma, Provincia) %>%\r\n  summarise(Total.Contratos = (sum(Total.Contratos)),\r\n            Contratos.iniciales.indefinidos.hombres = (sum(Contratos.iniciales.indefinidos.hombres)), \r\n            Contratos.iniciales.temporales.hombres = (sum(Contratos.iniciales.temporales.hombres)), \r\n            Contratos.iniciales.indefinidos.mujeres = (sum(Contratos.iniciales.indefinidos.mujeres)), \r\n            Contratos.iniciales.temporales.mujeres = (sum(Contratos.iniciales.temporales.mujeres)))\r\n#Seleccionamos las variables que nos interesen del dataset de \"gasto_desempleo\"\r\ngasto_desempleo_nuevo <- gasto_desempleo %>% select(Código.mes, Comunidad.Autónoma, Provincia, Gasto.Total.Prestación, Gasto.Prestación.Contributiva)

Secondly, the three tables should be merged into one that we will work with from this point onwards..

Caract_Dtes_empleo <- Reduce(merge, list(Dtes_empleo_provincia, gasto_desempleo_nuevo, Contratos_provincia))

 

c. Transformation of variables

When the table with variables of interest is created for further analysis and visualization, some of them should be transformed to other types, more adequate for future aggregations. 

#Transformación de una variable fecha\r\nCaract_Dtes_empleo$Código.mes <- as.factor(Caract_Dtes_empleo$Código.mes)\r\nCaract_Dtes_empleo$Código.mes <- parse_date_time(Caract_Dtes_empleo$Código.mes(c(\"200601\", \"ym\")), truncated = 3)\r\n#Transformamos a variable numérica\r\nCaract_Dtes_empleo$Gasto.Total.Prestación <- as.numeric(Caract_Dtes_empleo$Gasto.Total.Prestación)\r\nCaract_Dtes_empleo$Gasto.Prestación.Contributiva <- as.numeric(Caract_Dtes_empleo$Gasto.Prestación.Contributiva)\r\n#Transformación a variable factor\r\nCaract_Dtes_empleo$Provincia <- as.factor(Caract_Dtes_empleo$Provincia)\r\nCaract_Dtes_empleo$Comunidad.Autónoma <- as.factor(Caract_Dtes_empleo$Comunidad.Autónoma)

d. Exploratory analysis

Let´s see what variables and structure the new dataset presents. 

str(Caract_Dtes_empleo)\r\nsummary(Caract_Dtes_empleo)

The output of this portion of the code is omitted to facilitate reading. Main characteristics presented in the dataset are as follows: 

  • Time range covers a period from January to December 2020.
  • Number of columns (variables) is 17. .
  • It presents two categorical variables (“Province”, “Autonomous.Community”), one date variable (“Code.month”) and the rest are numerical variables. 

e. Detection and processing of missing data

Next, we will analyse whether the dataset has missing values (NAs). A treatment or elimination of NAs is essential, otherwise it will not be possible to process properly the numerical variables.

any(is.na(Caract_Dtes_empleo)) \r\n#Como el resultado es \"TRUE\", eliminamos los datos perdidos del dataset, ya que no sabemos cual es la razón por la cual no se encuentran esos datos\r\nCaract_Dtes_empleo <- na.omit(Caract_Dtes_empleo)\r\nany(is.na(Caract_Dtes_empleo))

4.3. Creation of new variables

In order to create a visualization, we are going to make a new variable from the two variables present in the data table. This operation is very common in the data analysis, as sometimes it´s interesting to work with calculated data (e.g., the sum or the average of different variables) instead of source data. In this case, we will calculate the average unemployment expenditure for each job seeker. For this purpose, variables of total expenditure per benefit (“Expenditure.Total.Benefit”) and the total number of job seekers (“total.JobSeekers.Employment”) will be used. 

Caract_Dtes_empleo$gasto_desempleado <-\r\n  (1000 * (Caract_Dtes_empleo$Gasto.Total.Prestación/\r\n     Caract_Dtes_empleo$total.Dtes.Empleo))

4.4. Save the dataset

Once the table containing variables of interest for analysis and visualizations is obtained, we will save it as a data file in CSV format to perform later other statistical analysis or use it within other processing or data visualization tools. It´s important to use the UTF-8 encoding (Unicode Transformation Format), so the special characters may be identified correctly by any other tool. 

write.csv(Caract_Dtes_empleo,\r\n          file=\"Caract_Dtes_empleo_UTF8.csv\",\r\n          fileEncoding= \"UTF-8\")

5. Creation of a visualization on the characteristics of employment demand in Spain using Kibana

The development of this interactive visualization has been performed with usage of Kibana in the local environment. We have followed Elastic company tutorial for both, download and installation of the software. 

Below you may find a tutorial video related to the whole process of creating a visualization. In the video you may see the creation of dashboard with different interactive indicators by generating graphic representations of different types. The steps to build a dashboard are as follows: 

A continuación se adjunta un vídeo tutorial donde se muestra todo el proceso de realización de la visualización. En el vídeo podrás ver la creación de un cuadro de mando (dashboard) con diferentes indicadores interactivos mediante la generación de representaciones gráficas de diferentes tipos. Los pasos para obtener el dashboard son los siguientes: 

  1. Load the data into Elasticsearch and generate an index that allows to interact with the data from Kibana. This index permits a search and management of the data in the loaded files, practically in real time.
  2. Generate the following graphic representations: ​
    • Line graph to represent a time series on the job seekers in Spain between 2006 and 2020.
    • Sector graph with job seekers broken down by province and Autonomous Community
    • Thematic map showing the number of new contracts registered in each province on the territory. For creation of this visual it´s necessary to download a dataset with province georeferencing published in the open data portal Open Data Soft.
  3. Build a dashboard. ​

Below you may find a tutorial video interacting with the visualization that we have just created: 

6. Conclusions

Looking at the visualization of the data related to the profile of job seekers in Spain during the years 2010-2020, the following conclusions may be drawn, among others: 

  • There are two significant increases of the job seekers number. The first, approximately in 2010, coincides with the economic crisis. The second, much more pronounced in 2020, coincides with the pandemic crisis.
  • A gender gap may be observed in the group of job seekers: the number of female job seekers is higher throughout the time series, mainly in the age groups above 25.
  • At the regional level, Andalusia, followed by Catalonia and Valencia, are the Autonomous Communities with the highest number of job seekers. In contrast to Andalusia, which is an Autonomous Community with the lowest unemployment expenditure, Catalonia presents the highest value.   
  • Temporal contracts are leading and the provinces which generate the highest number of contracts are Madrid and Barcelona, what coincides with the highest number of habitants, while on the other side, provinces with the lowest number of contracts are Soria, Ávila, Teruel and Cuenca, what coincides with the most depopulated areas of Spain. 

This visualization has helped us to synthetise a large amount of information and give it a meaning, allowing to draw conclusions and, if necessary, make decisions based on results. We hope that you like this new post, we will be back to present you new reuses of open data. See you soon! 

calendar icon
Noticia

We live in a connected world, where we all carry a mobile device that allows us to capture our environment and share it with whoever we want through social networks or different tools. This allows us to maintain contact with our loved ones even if we are thousands of kilometers away, but ... What if we also take advantage of this circumstance to enrich scientific research? We would be talking about what is known as citizen science.

Citizen science seeks "general public engagement in scientific research activities when citizens actively contribute to science either with their intellectual effort or surrounding knowledge or with their tools and resources". This definition is taken from the Green Paper on Citizen Science, developed in the framework of the European project Socientize (FP7), and explain us some of the keys to citizen science. In particular, citizen science is:

  • Participatory: Citizens of all types can collaborate in different ways, through the collection of information, or by making their experience and knowledge available to the research. This mixture of profiles creates a perfect atmosphere for innovation and new discoveries.
  • Volunteer: Given that participation is often altruistic, citizen science projects need to be aligned with the demands and interests of society. For this reason, projects that awaken the social conscience of citizens (for example, those related to environmentalism) are common.
  • Efficient: Thanks to the technological advances that we mentioned at the beginning, samples of the environment can be captured with greater ubiquity and immediacy. In addition, it facilitates the interconnection, and with it the cooperation, of companies, researchers and civil society. All this generate cost reduction and agile results.
  • Open: The data, metadata and publications generated during the investigation are published in open and accessible formats. This fact makes information easier to reuse and facilitate the repetition of research investigations to ensure its accuracy and soundness.

In short, this type of initiative seeks to generate a more democratic science that responds to the interests of all those involved, but above all, responds to the interest of citizens. And that generates information that can be reused in favour of society. Let's see some examples:

  • Mosquito Alert: This project seeks to fight against the tiger mosquito and the yellow fever mosquito, species that transmit diseases such as Zika, Dengue or Chikungunya. In this case, citizen participation consists in sending photographs of insects observed in the environment that are likely to belong to these species. A team of professionals analyzes the images to validate the findings. The data generated allows to monitor and make predictions about their behavior, which helps control their expansion. All this information is shared openly through GBIF España.
  • Sponsor a rock: With the objective of favoring the conservation of the Spanish geological heritage, the participants in this project commit to visit, at least once a year, the place of geological interest that they have sponsored. They will have to warn of any action or threat that they observe (anomalies, aggressions, pillaging of minerals or fossils ...). The information will help enrich the Spanish Inventory of Places of Geological Interest.
  • RitmeNatura.cat: The project consists of following the seasonal changes in plants and animals: when flowering is, the appearance of new insects, the changes in bird migration ... The objective is to control the effects of the climate change. The results can be downloaded in this link.
  • Identification of near-Earth asteroids: Participants in the project will help identify asteroids using astronomical images. The Minor Planet Center (organism of the International Astronomical Union responsible for the minor bodies of the Solar System) will evaluate the data to improve the orbits of these objects and estimate more accurately the probability of a possible impact with the Earth. You can see some of the results here.
  • Arturo: An area where citizen science can bring great advantages is in the training of artificial intelligences. This is the case of Arturo, an automatic learning algorithm designed to determine which the most optimal urban conditions are. To do this, collaborators must answer a questionnaire where they will choose the images that best fit their concept of a habitable environment. The objective is to help technicians and administrations to generate environments aligned with the needs of citizens. The data generated and the model used can be downloaded at the following link.

If you are interested in knowing more projects of this type, you can visit the Spanish Citizen Science webpage whose objective is to increase knowledge and vision about citizen science. It includes the Ministry of Science, Innovation and Universities, the Spanish Foundation for Science and Technology and the Ibercivis Foundation. A quick look at the projects section will let you know what kind of activities are being carried out. Maybe you find one of your interest...

 

calendar icon
Empresa reutilizadora

portalestadistico.com integrates and disseminates official statistics from multiple sources for each of the territories that make up Spain. The offer interactive dashboards and visual data analysis tools, thus promoting the reuse of public information and multiplying data possibilities.

In short, they help local administrations to be more efficient and transparent by disseminating open intelligent data related to their territories.

calendar icon