Many people use apps to get around in their daily lives. Apps such as Google Maps, Moovit or CityMapper provide the fastest and most efficient route to a destination. However, what many users are unaware of is that behind these platforms lies a valuable source of information: open data. By reusing public datasets, such as those related to air quality, traffic or public transport, these applications can provide a better service.
In this post, we will explore how the reuse of open data by these platforms empowers a smarter and more sustainable urban ecosystem .
Google Maps: aggregates air quality information and transport data into GTFS.
More than a billion people use Google Maps every month around the world. The tech giant offers a free, up-to-date world map that draws its data from a variety of sources, some of them open.
One of the functions provided by the app is information about the air quality in the user's location. The Air Quality Index (AQI) is a parameter that is determined by each country or region. The European benchmark can be consulted on this map which shows air quality by geolocated zones in real time.
To display the air quality of the user's location, Google Maps applies a model based on a multi-layered approach known as the "fusion approach". This method combines data from several input sources and weights the layers with a sophisticated procedure. The input layers are:
- Government reference monitoring stations
- Commercial sensor networks
- Global and regional dispersion models
- Dust and smoke fire models
- Satellite information
- Traffic data
- Ancillary information such as surface area
- Meteorology
In the case of Spain, this information is obtained from open data sources such as the Ministry of Ecological Transition and Demographic Challenge, the Regional Ministry of Environment, Territory and Housing of the Xunta de Galicia or the Community of Madrid. Open data sources used in other countries around the worldcan be found here .
Another functionality offered by Google Maps to plan the best routes to reach a destination is the information on public transport. These data are provided on a voluntary basis by the public companies providing transport services in each city. In order to make this open data available to the user, it is first dumped into Google Transit and must comply with the open public transport standard GTFS (General Transit Feed Specification).
Moovit: reusing open data to deliver real-time information
Moovit is another urban mobility app most used by Spaniards, which uses open and collaborative data to make it easier for users to plan their journeys by public transport.
Since its launch in 2012, the free-to-download app offers real-time information on the different transport options, suggests the best routes to reach the indicated destination, guides users during their journey (how long they have to wait, how many stops are left, when they have to get off, etc.) and provides constant updates in the event of any alteration in the service.
Like other mobility apps , it is also available in offline mode and allows you to save routes and frequent lines in "Favourites". It is also an inclusive solution as it integrates VoiceOver (iOs) or TalkBack (Android) for blind people.
The platform not only leverages open data provided by governments and local authorities, but also collects information from its users, allowing it to offer a dynamic and constantly updated service.
CityMapper: born as a reuser of open mobility data
The CityMapper development team recognises that the application was born with an open DNA that still remains. They reuse open datasets from, for example, OpenStreetMap at global level or RENFE and Cercanías Bilbao at national level. As the application becomes available in more cities, the list of open data reference sources from which it draws information grows.
The platform offers real-time information on public transport routes, including bus, train, metro and bike sharing. It also adds options for walking, cycling and ridesharing. It is designed to provide the most efficient and fastest route to a destinationby integrating data from different modes of transport into a single interface.
As we published in the monographic report "Municipal Innovation through Open Data" CityMapper mainly uses open data from local transport authorities, typically using the GTFS (General Transit Feed Specification) standard . However, when this data is not sufficient or accurate enough, CityMapper combines it with datasets generated by the application's own users who voluntarily collaborate. It also uses data enhanced and managed by the work of the company's own local employees. All this data is combined with artificial intelligence algorithms developed to optimise routes and provide recommendations tailored to users' needs.
In conclusion, the use of open data in transport is driving a significant transformation in the mobility sector in cities. Through their contribution to applications, users can access up-to-date and accurate data, plan their journeys efficiently and make informed decisions. Governments, for their part, have taken on the role of facilitators by enabling the dissemination of data through open platforms, optimising resources and fostering collaboration across sectors. In addition, open data has created new opportunities for developers and the private sector, who have contributed with technological solutions such as Google Maps, Moovit or CityMapper. Ultimately, the potential of open data to transform the future of urban mobility is undeniable.
The digital revolution is transforming municipal services, driven by the increasing adoption of artificial intelligence (AI) technologies that also benefit from open data. These developments have the potential to redefine the way municipalities deliver services to their citizens, providing tools to improve efficiency, accessibility and sustainability. This report looks at success stories in the deployment of applications and platforms that seek to improve various aspects of life in municipalities, highlighting their potential to unlock more of the vast untapped potential of open data and associated artificial intelligence technologies.
The applications and platforms described in this report have a high potential for replicability in different municipal contexts, as they address common problems. Replication of these solutions can take place through collaboration between municipalities, companies and developers, as well as through the release and standardisation of open data.
Despite the benefits, the adoption of open data for municipal innovation also presents significant challenges. The quality, updating and standardisation of data published by local authorities, as well as interoperability between different platforms and systems, must be ensured. In addition, the open data culture needs to be reinforced among all actors involved, including citizens, developers, businesses and public administrations themselves.
The use cases analysed are divided into four sections. Each of these sections is described below and some examples of the solutions included in the report are shown.
Transport and Mobility
One of the most significant challenges in urban areas is transport and mobility management. Applications using open data have proven to be effective in improving these services. For example, applications such as Park4Dis make it easy to locate parking spaces for people with reduced mobility, using data from multiple municipalities and contributions from volunteers. CityMapper, which has gone global, on the other hand, offers optimised public transport routes in real time, integrating data from various transport modes to provide the most efficient route. These applications not only improve mobility, but also contribute to sustainability by reducing congestion and carbon emissions.
Environment and Sustainability
Growing awareness of sustainability has spurred the development of applications that promote environmentally friendly practices. CleanSpot, for example, facilitates the location of recycling points and the management of municipal waste. The application encourages citizen participation in cleaning and recycling, contributing to the reduction of the ecological footprint. Liight gamifies sustainable behaviour by rewarding users for actions such as recycling or using public transport. These applications not only improve environmental management, but also educate and motivate citizens to adopt more sustainable habits.
Optimisation of Basic Public Services
Urban service management platforms, such as Gestdropper, use open data to monitor and control urban infrastructure in real time. These tools enable more efficient management of resources such as street lighting, water networks and street furniture, optimising maintenance, incident response and reducing operating costs. Moreover, the deployment of appointment management systems, such as CitaME, helps to reduce waiting times and improve efficiency in customer service.
Citizen Services Aggregators
Applications that centralise public information and services, such as Badajoz Es Más and AppValencia, improve accessibility and communication between administrations and citizens. These platforms provide real-time data on public transport, cultural events, tourism and administrative procedures, making life in the municipality easier for residents and tourists alike. For example, integrating multiple services into a single application improves efficiency and reduces the need for unnecessary travel. These tools also support local economies by promoting cultural events and commercial services.
Conclusions
The use of open data and artificial intelligence technologies is transforming municipal management, improving the efficiency, accessibility and sustainability of public services. The success stories presented in this report describe how these tools can benefit both citizens and public administrations by making cities smarter, more inclusive and sustainable environments, and more responsive to the needs and well-being of their inhabitants and visitors.
Listen the podcast (only availible in spanish)
Spain, as part of the European Union, is committed to the implementation of the European directives on open data and re-use of public sector information. This includes the adoption of initiatives such as the Implementing Regulation (EU) 2023/138 issued by the European Commission, which defines specific guidelines for government entities with regard to the availability of High value datasets (HVD). These data are categorised into themes previously detailed in earlier discussions: Geospatial, Earth Observation and Environment, Meteorology, Statistics, Societies and Societal Properties, and Mobility. In this article we will focus on the last group mentioned.
The Mobility category encompasses data collections falling under the domain of "Transport Networks", as demarcated in Annex I of the Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE). In particular, this Directive refers to the requirement to make available to users datasets relating to road, rail, air and inland waterway networks, with their associated infrastructure, connections between different networks and the trans-European transport network, as defined by Decision No 1692/96/EC of the European Parliament and of the Council of 23 July 1996 on Community guidelines for the development of the trans-European transport network.
In addition, it includes the datasets as described in the Directive 2005/44/EC of the European Parliament and of the Council of 7 September 2005 on harmonised River Information Services (RIS) on inland waterways in the Community. The main objective of the Directive is to improve inland waterway traffic and transport, and it applies to canals, rivers, lakes and ports capable of accommodating vessels of between 1,000 and 1,500 tonnes. These datasets include:
| Data type | Inland waterways datasets |
|---|---|
| Static data |
|
| Dynamic data |
|
| Inland electronic and navigational charts (Inland ENC according to the Inland ECDIS Standard) |
|
Figure 1: Table with the high value datasets related to Directive 2005/44/EC for the creation of a trans-European river information network.
In order for all of us to make the most of the information available, the Regulation defines some basic rules on how this data is shared:
- Free and easy to use. The data must be ready to be used and shared with everyone for any purpose by acknowledging and citing the source of the data, as prescribed by the Creative Commons BY 4.0 licence.
- Easy to read and use. Data will be presented in a way that both people and computers can easily understand them and everything will be explained in public.
- Direct and easy access. There will be special ways (called APIs) that allow programs to access data automatically. In addition, the user can alternatively download a lot of information at once.
- Always up to date. It is important that data is up to date, so there will be access to the most recent version. But if the user needs to access previous data, it will also be possible to view previous versions.
- Detailed and precise. Data will be shared in as much detail as possible, to a very fine level of accuracy, so that the whole territory is covered when combined.
- Information on information. There will be "information about the information" (metadata) that will tell everything about the data. The metadata shall contain at least the elements listed in the Annex to Commission Regulation (EC) No 1205/2008 of 3 December 2008.
- Understandable and orderly: It will explain well how the data are organised and what all means, in a way that is easy for everyone to understand (structure and semantics).
- Common language. Data shall use vocabularies, code lists and categories that are recognised and accepted at European or global level.
in Spain, who is responsible for the creation and maintenance of mobility data?
In Spain, the responsibility for the creation and maintenance of mobility data generally lies with different governmental entities, depending on the type of mobility and the territorial scope:
- Level national level. The Ministry of Transport and Sustainable Mobility is the main body in charge of mobility in terms of infrastructure and transport at national level. This would include data on roads, railways, air and maritime transport.
- Regional and local level. Autonomous communities and municipalities also play an important role in urban and regional mobility. They are responsible for urban mobility, public transport and public roads, within their respective jurisdictions.
- Public business entities. There are entities such as ADIF (acronym for Administrador de Infraestructuras Ferroviarias, that is Railway Infrastructure Administrator), AENA (acronym for Aeropuertos Españoles y Navegación Aérea, that is Spanish Airports and Air Navigation), Puertos del Estado (State Ports) and others tentities hat manage specific data related to their field of action in rail, air and maritime transport, respectively.
In Spain, the Ministry of Transport and Sustainable Mobility, in collaboration with the autonomous communities, plays a key role in providing access to a wide range of mobility data. In compliance with INSPIRE and LISIGE (Law 14/2010 of 5 July 2010 on geographic information infrastructures and services in Spain, which transposes the INSPIRE Directive), it offers resources such as the Geoportal of the Spatial Data Infrastructure of Spain (IDEE in Spanish acronyms) where citizens and professionals can access geographic data and services, especially with regard to mobility.
Does Spain comply with the HVD Mobility Regulation?
To solve this question we have to go to the INSPIRE Geoportal where official information classified as high value datasets in Europe is available. Specifically in the mobility category.

Figure 2: Screenshot of the Inspire Geoportal.
As of April 2024 Spain has published the following information in the INSPIRE Geoportal:
- Port service areas in Spain. The port service areas include the cartographic and alphanumeric information of the land service area and water areas I and II. The Spanish State-owned Port System is made up of 46 ports of general interest, managed by 28 Port Authorities.
- Spanish Transport Networks. The Transport Network of the Geographic Reference Information of the National Cartographic System of Spain is a three-dimensional network of national coverage, defined and published in accordance with the INSPIRE Directive, which contemplates five modes of transport: road, rail, inland waterways, air and cable, together with their respective intermodal connections and the infrastructures associated with each mode. This information has the linear geometry of the roads and the punctual geometry of the portals and kilometre points.
- ADIF''s Spanish Rail Transport Network. Public geographic dataset on the adaptation of the Spanish ADIF Common Traamification to the INSPIRE regulations (Transport Networks Annex I).
The publication of these high-value datasets responds positively to the question of Spain''s compliance with the HVD regulation, and is an achievement that reflects Spain''s continued commitment to transparency and access to mobility data.
The joint effort between the Ministry of Transport, Mobility and Urban Agenda, the National Cartographic System and the Autonomous Communities and Public Business Entities underlines the importance of a collaborative approach to mobility information management.
The availability of this data highlights Spain''s commitment to publishing high-value datasets and underlines the importance of continuously improving access to information to optimise inland navigation and mobility data.
Content prepared by Mayte Toscano, Senior Consultant in Data Economy Technologies. The contents and points of view reflected in this publication are the sole responsibility of its author.
1. Introduction
Visualisations are graphical representations of data that allow to communicate, in a simple and effective way, the information linked to the data. The visualisation possibilities are very wide ranging, from basic representations such as line graphs, bar charts or relevant metrics, to interactive dashboards.
In this section of "Step-by-Step Visualisations we are regularly presenting practical exercises making use of open data available at datos.gob.es or other similar catalogues. They address and describe in a simple way the steps necessary to obtain the data, carry out the relevant transformations and analyses, and finally draw conclusions, summarizing the information.
Documented code developments and free-to-use tools are used in each practical exercise. All the material generated is available for reuse in the GitHub repository of datos.gob.es.
In this particular exercise, we will explore the current state of electric vehicle penetration in Spain and the future prospects for this disruptive technology in transport.
Access the data lab repository on Github.
Run the data pre-processing code on Google Colab.
In this video (available with English subtitles), the author explains what you will find both on Github and Google Colab.
2. Context: why is the electric vehicle important?
The transition towards more sustainable mobility has become a global priority, placing the electric vehicle (EV) at the centre of many discussions on the future of transport. In Spain, this trend towards the electrification of the car fleet not only responds to a growing consumer interest in cleaner and more efficient technologies, but also to a regulatory and incentive framework designed to accelerate the adoption of these vehicles. With a growing range of electric models available on the market, electric vehicles represent a key part of the country's strategy to reduce greenhouse gas emissions, improve urban air quality and foster technological innovation in the automotive sector.
However, the penetration of EVs in the Spanish market faces a number of challenges, from charging infrastructure to consumer perception and knowledge of EVs. Expansion of the freight network, together with supportive policies and fiscal incentives, are key to overcoming existing barriers and stimulating demand. As Spain moves towards its sustainability and energy transition goals, analysing the evolution of the electric vehicle market becomes an essential tool to understand the progress made and the obstacles that still need to be overcome.
3. Objective
This exercise focuses on showing the reader techniques for the processing, visualisation and advanced analysis of open data using Python. We will adopt a "learning-by-doing" approach so that the reader can understand the use of these tools in the context of solving a real and topical challenge such as the study of EV penetration in Spain. This hands-on approach not only enhances understanding of data science tools, but also prepares readers to apply this knowledge to solve real problems, providing a rich learning experience that is directly applicable to their own projects.
The questions we will try to answer through our analysis are:
- Which vehicle brands led the market in 2023?
- Which vehicle models were the best-selling in 2023?
- What market share will electric vehicles absorb in 2023?
- Which electric vehicle models were the best-selling in 2023?
- How have vehicle registrations evolved over time?
- Are we seeing any trends in electric vehicle registrations?
- How do we expect electric vehicle registrations to develop next year?
- How much CO2 emission reduction can we expect from the registrations achieved over the next year?
4. Resources
To complete the development of this exercise we will require the use of two categories of resources: Analytical Tools and Datasets.
4.1. Dataset
To complete this exercise we will use a dataset provided by the Dirección General de Tráfico (DGT) through its statistical portal, also available from the National Open Data catalogue (datos.gob.es). The DGT statistical portal is an online platform aimed at providing public access to a wide range of data and statistics related to traffic and road safety. This portal includes information on traffic accidents, offences, vehicle registrations, driving licences and other relevant data that can be useful for researchers, industry professionals and the general public.
In our case, we will use their dataset of vehicle registrations in Spain available via:
- Open Data Catalogue of the Spanish Government.
- Statistical portal of the DGT.
Although during the development of the exercise we will show the reader the necessary mechanisms for downloading and processing, we include pre-processed data
in the associated GitHub repository, so that the reader can proceed directly to the analysis of the data if desired.
*The data used in this exercise were downloaded on 04 March 2024. The licence applicable to this dataset can be found at https://datos.gob.es/avisolegal.
4.2. Analytical tools
- Programming language: Python - a programming language widely used in data analysis due to its versatility and the wide range of libraries available. These tools allow users to clean, analyse and visualise large datasets efficiently, making Python a popular choice among data scientists and analysts.
- Platform: Jupyter Notebooks - ia web application that allows you to create and share documents containing live code, equations, visualisations and narrative text. It is widely used for data science, data analytics, machine learning and interactive programming education.
-
Main libraries and modules:
- Data manipulation: Pandas - an open source library that provides high-performance, easy-to-use data structures and data analysis tools.
- Data visualisation:
- Matplotlib: a library for creating static, animated and interactive visualisations in Python..
- Seaborn: a library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphs.
- Statistics and algorithms:
- Statsmodels: a library that provides classes and functions for estimating many different statistical models, as well as for testing and exploring statistical data.
- Pmdarima: a library specialised in automatic time series modelling, facilitating the identification, fitting and validation of models for complex forecasts.
5. Exercise development
It is advisable to run the Notebook with the code at the same time as reading the post, as both didactic resources are complementary in future explanations
The proposed exercise is divided into three main phases.
5.1 Initial configuration
This section can be found in point 1 of the Notebook.
In this short first section, we will configure our Jupyter Notebook and our working environment to be able to work with the selected dataset. We will import the necessary Python libraries and create some directories where we will store the downloaded data.
5.2 Data preparation
This section can be found in point 2 of the Notebookk.
All data analysis requires a phase of accessing and processing to obtain the appropriate data in the desired format. In this phase, we will download the data from the statistical portal and transform it into the format Apache Parquet format before proceeding with the analysis.
Those users who want to go deeper into this task, please read this guide Practical Introductory Guide to Exploratory Data Analysis.
5.3 Data analysis
This section can be found in point 3 of the Notebook.
5.3.1 Análisis descriptivo
In this third phase, we will begin our data analysis. To do so,we will answer the first questions using datavisualisation tools to familiarise ourselves with the data. Some examples of the analysis are shown below:
- Top 10 Vehicles registered in 2023: In this visualisation we show the ten vehicle models with the highest number of registrations in 2023, also indicating their combustion type. The main conclusions are:
- The only European-made vehicles in the Top 10 are the Arona and the Ibiza from Spanish brand SEAT. The rest are Asians.
- Nine of the ten vehicles are powered by gasoline.
- The only vehicle in the Top 10 with a different type of propulsion is the DACIA Sandero LPG (Liquefied Petroleum Gas).

Figure 1. Graph "Top 10 vehicles registered in 2023"
- Market share by propulsion type: In this visualisation we represent the percentage of vehicles registered by each type of propulsion (petrol, diesel, electric or other). We see how the vast majority of the market (>70%) was taken up by petrol vehicles, with diesel being the second choice, and how electric vehicles reached 5.5%.

Figure 2. Graph "Market share by propulsion type".
- Historical development of registrations: This visualisation represents the evolution of vehicle registrations over time. It shows the monthly number of registrations between January 2015 and December 2023 distinguishing between the propulsion types of the registered vehicles, and there are several interesting aspects of this graph:
- We observe an annual seasonal behaviour, i.e. we observe patterns or variations that are repeated at regular time intervals. We see recurring high levels of enrolment in June/July, while in August/September they decrease drastically. This is very relevant, as the analysis of time series with a seasonal factor has certain particularities.
-
The huge drop in registrations during the first months of COVID is also very remarkable.
-
We also see that post-covid enrolment levels are lower than before.
-
Finally, we can see how between 2015 and 2023 the registration of electric vehicles is gradually increasing.

Figure 3. Graph "Vehicle registrations by propulsion type".
- Trend in the registration of electric vehicles: We now analyse the evolution of electric and non-electric vehicles separately using heat maps as a visual tool. We can observe very different behaviours between the two graphs. We observe how the electric vehicle shows a trend of increasing registrations year by year and, despite the COVID being a halt in the registration of vehicles, subsequent years have maintained the upward trend.

Figure 4. Graph "Trend in registration of conventional vs. electric vehicles".
5.3.2. Predictive analytics
To answer the last question objectively, we will use predictive models that allow us to make estimates regarding the evolution of electric vehicles in Spain. As we can see, the model constructed proposes a continuation of the expected growth in registrations throughout the year of 70,000, reaching values close to 8,000 registrations in the month of December 2024 alone.

Figure 5. Graph "Predicted electric vehicle registrations".
5. Conclusions
As a conclusion of the exercise, we can observe, thanks to the analysis techniques used, how the electric vehicle is penetrating the Spanish vehicle fleet at an increasing speed, although it is still at a great distance from other alternatives such as diesel or petrol, for now led by the manufacturer Tesla. We will see in the coming years whether the pace grows at the level needed to meet the sustainability targets set and whether Tesla remains the leader despite the strong entry of Asian competitors.
6. Do you want to do the exercise?
If you want to learn more about the Electric Vehicle and test your analytical skills, go to this code repository where you can develop this exercise step by step.
Also, remember that you have at your disposal more exercises in the section "Step by step visualisations" "Step-by-step visualisations" section.
Content elaborated by Juan Benavente, industrial engineer and expert in technologies linked to the data economy. The contents and points of view reflected in this publication are the sole responsibility of the author.
The Council of Ministers approved in February this year the Sustainable Mobility Bill (PL), a commitment to a digital and innovative transport system in which open mobility data will play a key role.
Inaddition to regulating innovative solutions such as on-demand transport, car sharing or temporary use of vehicles, the regulation will encourage the promotion ofopen data by administrations, infrastructure managers and public and private operators. All this, as stated in Chapter III Title V of the Draft Law "will bring enormous benefits to citizens, e.g. for new mobility and their contribution to the European Green Pact".
This Bill is aligned with the European Data Strategy, which has among its objectives to create a single market for data that ensures Europe' s global competitiveness and data sovereignty through the creation of common European data spaces common European data spaces in nine strategic sectors. In particular, it foresees the creation and development of a common European mobility data space to put Europe at the forefront of the development of a smart transport system, including connected cars and other modes of transport. Along these lines, the European Commission presented its Sustainable and Intelligent Mobility Strategywhich includes an action dedicated to innovation, data and artificial intelligence for smarter mobility. Following in Europe's footsteps, Spain has launched this Sustainable Mobility Bill.
In this post we look at the benefits that the use of open data can bring to the sector, the obligations that the PL will place on data, and the next steps in building the Integrated Mobility Data Space.
Benefits of using open data on sustainable mobility
The Ministry of Transport and Sustainable Mobility, in the web section created for the Law, identifies some of the benefits that access to and use of open transport and mobility data can offer both to the business community and to public administrations and citizens in general:
- Encourage the development of applications that enable citizens to make decisions on the planning of their journeys and during the course of their journeys.
- Improve the conditions of service provision and the travel experience .
- Incentivise research, create new developments and businesses from the data generated in the transport and mobility ecosystem.
- Enable public administrations to have a better understanding of the transport and mobility system in order to improve the definition of public policies and the management of the system.
- Encourage the use of this data for other public interest purposes that may arise.

Ensuring access to open mobility data
In order to make good use of these data and thus take advantage of all the benefits they offer, the Draft Law determines a strategy to ensure the availability of open data in the field of transport and mobility. This strategy concerns:
- transport companies and infrastructure managers, which must drive digitalisation and provide a significant part of the data, with specific characteristics and functionalities.
- administrations and public entities were already obliged to ensure the openness of their data by design, as well as its re-use on the basis of the already existing
In short, the guidelines for re-use already defined in Law 37/2007 for the public sector are respected, and the need to regulate access to this information and the way in which this data is used by third parties, i.e. companies in the sector, is also included.
Integrated Mobility Data Space
In line with the European Data Strategy mentioned at the beginning of the post, the PL determines the obligation to create the Integrated Mobility Data Space (EDIM) under the direction of the Ministry of Transport and Sustainable Mobility, in coordination with the Secretary of State for Digitalisation and Artificial Intelligence. In the EDIM, the aforementioned transport companies, infrastructure managers and administrations will share their data, which will optimise the decision making of all actors when planning the implementation of new infrastructures and the launch of new services.
The Draft Law defines some characteristics of the Integrated Mobility Data Space such as the modular structure, which will include information in a systematic way on different areas of urban, metropolitan and interurban mobility, both for people and goods.
Specifically, the EDIM, according to Article 14, would collect data "in digital form in a free, non-discriminatory and up-to-date manner" on:
- Supply and demand of the different modes of transport and mobility, information on public transport services and mobility services under the responsibility of the administrations
- Financial situation and costs of providing services for all modes of public transport, investments in transport infrastructure, inventory of transport infrastructure and terminals, conditions and degree of accessibility.
- Other data to be agreed at the Sectoral Conference on Transport.
It identifies examples of this type of data and information on the responsibility for its provision, format, frequency of updating and other characteristics.
As referred to in the CP, the data and information managed by the EDIM will provide an integrated vision to analyse and facilitate mobility management, improving the design of sustainable and efficient solutions, and transparency in the design of public transport and mobility policies. In addition, the Law will promote the creation of a sandbox or test environment to serve as an incubator for innovative mobility projects. The outcome of the tests will allow both the developer and the administration to learn by observing the market in a controlled environment.
National Bimodal Transport Access Point
On the other hand, the Bill also provides for the creation of a National Bimodal Transport Access Point that will collect the information communicated to the Ministry of Transport and Sustainable Mobility in the framework of the priority action "Provision of information services on multimodal journeys throughout the Union" of Directive 2010/40/EU which refers to the transport of goods and/or persons by more than one means of transport.
This information will be freely accessible and will also serve to feed the EDIM in the area related to the characterisation of transport and mobility of persons, as well as the National Catalogue of Public Information maintained by the General State Administration.
The Bill defines that the provision of services to citizens using transport and mobility data from the National Multimodal Transport Access Point must be done in a fair, neutral, impartial, non-discriminatory and transparent manner. It adds that the Ministry of Transport and Sustainable Mobility will propose rules for the use of such data within 12 months after the entry into force of this law.
The Sustainable Mobility Bill is currently in parliamentary procedure, as it has been sent to the Spanish Parliament for urgent processing and approval in 2024.
This application shows the location of charging stations for electric vehicles within the Community of Castilla y León. The user can select the province of interest and access information on the available stations. For each station, various information is provided, such as its location, the supplier company, the number of charging stations and the available connectors.
In addition, within the app the user can also find information on incentive programmes for electric mobility, such as the MOVES III Plan.
The geographical data of these chargers have been obtained from the servers of the public platform of Open Data of Castilla y León.
Translated with DeepL.com (free version)
The application provides users with real-time information on the status of the parking lot: free spaces and occupied spaces. By means of a map of the parking lot, the user can directly visualize the occupancy status of each parking space (free or occupied) and the entire parking lot (number of free and occupied spaces). It also shows the option of consulting the access route to the parking lot from the user's position. Another type of data offered is the current rates for the use of the parking lot.
The data used by the application to display information, such as the set of data on parking sensors, are available on the transparency portal of the City Council of Villanueva de la Serena.
1. Introduction
Visualizations are graphical representations of data that allow the information linked to them to be communicated in a simple and effective way. The visualization possibilities are very wide, from basic representations, such as line, bar or sector graphs, to visualizations configured on interactive dashboards.
In this "Step-by-Step Visualizations" section we are regularly presenting practical exercises of open data visualizations available in datos.gob.es or other similar catalogs. They address and describe in a simple way the stages necessary to obtain the data, perform the transformations and analyses that are relevant to, finally, enable the creation of interactive visualizations that allow us to obtain final conclusions as a summary of said information. In each of these practical exercises, simple and well-documented code developments are used, as well as tools that are free to use. All generated material is available for reuse in the GitHub Data Lab repository.
Then, and as a complement to the explanation that you will find below, you can access the code that we will use in the exercise and that we will explain and develop in the following sections of this post.
Access the data lab repository on Github.
Run the data pre-processing code on top of Google Colab.
2. Objetive
The main objective of this exercise is to show how to perform a network or graph analysis based on open data on rental bicycle trips in the city of Madrid. To do this, we will perform a preprocessing of the data in order to obtain the tables that we will use next in the visualization generating tool, with which we will create the visualizations of the graph.
Network analysis are methods and tools for the study and interpretation of the relationships and connections between entities or interconnected nodes of a network, these entities being persons, sites, products, or organizations, among others. Network analysis seeks to discover patterns, identify communities, analyze influence, and determine the importance of nodes within the network. This is achieved by using specific algorithms and techniques to extract meaningful insights from network data.
Once the data has been analyzed using this visualization, we can answer questions such as the following:
- What is the network station with the highest inbound and outbound traffic?
- What are the most common interstation routes?
- What is the average number of connections between stations for each of them?
- What are the most interconnected stations within the network?
3. Resources
3.1. Datasets
The open datasets used contain information on loan bike trips made in the city of Madrid. The information they provide is about the station of origin and destination, the time of the journey, the duration of the journey, the identifier of the bicycle, ...
These open datasets are published by the Madrid City Council, through files that collect the records on a monthly basis.
These datasets are also available for download from the following Github repository.
3.2. Tools
To carry out the data preprocessing tasks, the Python programming language written on a Jupyter Notebook hosted in the Google Colab cloud service has been used.
"Google Colab" or, also called Google Colaboratory, is a cloud service from Google Research that allows you to program, execute and share code written in Python or R on a Jupyter Notebook from your browser, so it does not require configuration. This service is free of charge.
For the creation of the interactive visualization, the Gephi tool has been used.
"Gephi" is a network visualization and analysis tool. It allows you to represent and explore relationships between elements, such as nodes and links, in order to understand the structure and patterns of the network. The program requires download and is free.
If you want to know more about tools that can help you in the treatment and visualization of data, you can use the report "Data processing and visualization tools".
4. Data processing or preparation
The processes that we describe below you will find them commented in the Notebook that you can also run from Google Colab.
Due to the high volume of trips recorded in the datasets, we defined the following starting points when analysing them:
- We will analyse the time of day with the highest travel traffic
- We will analyse the stations with a higher volume of trips
Before launching to analyse and build an effective visualization, we must carry out a prior treatment of the data, paying special attention to its obtaining and the validation of its content, making sure that they are in the appropriate and consistent format for processing and that they do not contain errors.
As a first step of the process, it is necessary to perform an exploratory analysis of the data (EDA), in order to properly interpret the starting data, detect anomalies, missing data or errors that could affect the quality of subsequent processes and results. If you want to know more about this process you can resort to the Practical Guide of Introduction to Exploratory Data Analysis
The next step is to generate the pre-processed data table that we will use to feed the network analysis tool (Gephi) that will visually help us understand the information. To do this, we will modify, filter and join the data according to our needs.
The steps followed in this data preprocessing, explained in this Google Colab Notebook, are as follows:
- Installation of libraries and loading of datasets
- Exploratory Data Analysis (EDA)
- Generating pre-processed tables
You will be able to reproduce this analysis with the source code that is available in our GitHub account. The way to provide the code is through a document made on a Jupyter Notebook that, once loaded into the development environment, you can easily run or modify.
Due to the informative nature of this post and to favour the understanding of non-specialized readers, the code is not intended to be the most efficient but to facilitate its understanding, so you will possibly come up with many ways to optimize the proposed code to achieve similar purposes. We encourage you to do so!
5. Network analysis
5.1. Definition of the network
The analysed network is formed by the trips between different bicycle stations in the city of Madrid, having as main information of each of the registered trips the station of origin (called "source") and the destination station (called "target").
The network consists of 253 nodes (stations) and 3012 edges (interactions between stations). It is a directed graph, because the interactions are bidirectional and weighted, because each edge between the nodes has an associated numerical value called "weight" which in this case corresponds to the number of trips made between both stations.
5.2. Loading the pre-processed table in to Gephi
Using the "import spreadsheet" option on the file tab, we import the previously pre-processed data table in CSV format. Gephi will detect what type of data is being loaded, so we will use the default predefined parameters.
5.3. Network display options
5.3.1 Distribution window
First, we apply in the distribution window, the Force Atlas 2 algorithm. This algorithm uses the technique of node repulsion depending on the degree of connection in such a way that the sparsely connected nodes are separated from those with a greater force of attraction to each other.
To prevent the related components from being out of the main view, we set the value of the parameter "Severity in Tuning" to a value of 10 and to avoid that the nodes are piled up, we check the option "Dissuade Hubs" and "Avoid overlap".
Dentro de la ventana de distribución, también aplicamos el algoritmo de Expansión con la finalidad de que los nodos no se encuentren tan juntos entre sí mismos.
Figure 3. Distribution window - Expansion algorithm
5.3.2 Appearance window
Next, in the appearance window, we modify the nodes and their labels so that their size is not equal but depends on the value of the degree of each node (nodes with a higher degree, larger visual size). We will also modify the colour of the nodes so that the larger ones are a more striking colour than the smaller ones. In the same appearance window we modify the edges, in this case we have opted for a unitary colour for all of them, since by default the size is according to the weight of each of them.
A higher degree in one of the nodes implies a greater number of stations connected to that node, while a greater weight of the edges implies a greater number of trips for each connection.
5.3.3 Graph window
Finally, in the lower area of the interface of the graph window, we have several options such as activating / deactivating the button to show the labels of the different nodes, adapting the size of the edges in order to make the visualization cleaner, modify the font of the labels, ...
Next, we can see the visualization of the graph that represents the network once the visualization options mentioned in the previous points have been applied.
Figure 6. Graph display
Activating the option to display labels and placing the cursor on one of the nodes, the links that correspond to the node and the rest of the nodes that are linked to the chosen one through these links will be displayed.
Next, we can visualize the nodes and links related to the bicycle station "Fernando el Católico". In the visualization, the nodes that have a greater number of connections are easily distinguished, since they appear with a larger size and more striking colours, such as "Plaza de la Cebada" or "Quevedo".
5.4 Main network measures
Together with the visualization of the graph, the following measurements provide us with the main information of the analysed network. These averages, which are the usual metrics when performing network analytics, can be calculated in the statistics window.
- Nodes (N): are the different individual elements that make up a network, representing different entities. In this case the different bicycle stations. Its value on the network is 243
- Links (L): are the connections that exist between the nodes of a network. Links represent the relationships or interactions between the individual elements (nodes) that make up the network. Its value in the network is 3014
- Maximum number of links (Lmax): is the maximum possible number of links in the network. It is calculated by the following formula Lmax= N(N-1)/2. Its value on the network is 31878
- Average grade (k): is a statistical measure to quantify the average connectivity of network nodes. It is calculated by averaging the degrees of all nodes in the network. Its value in the network is 23.8
- Network density (d): indicates the proportion of connections between network nodes to the total number of possible connections. Its value in the network is 0.047
- Diámetro (dmax ): is the longest graph distance between any two nodes of the res, i.e., how far away the 2 nodes are farther apart. Its value on the network is 7
- Mean distance (d):is the average mean graph distance between the nodes of the network. Its value on the network is 2.68
- Mean clustering coefficient (C): Indicates how nodes are embedded between their neighbouring nodes. The average value gives a general indication of the grouping in the network. Its value in the network is 0.208
- Related component: A group of nodes that are directly or indirectly connected to each other but are not connected to nodes outside that group. Its value on the network is 24
5.5 Interpretation of results
The probability of degrees roughly follows a long-tail distribution, where we can observe that there are a few stations that interact with a large number of them while most interact with a low number of stations.
The average grade is 23.8 which indicates that each station interacts on average with about 24 other stations (input and output).
In the following graph we can see that, although we have nodes with degrees considered as high (80, 90, 100, ...), it is observed that 25% of the nodes have degrees equal to or less than 8, while 75% of the nodes have degrees less than or equal to 32.
The previous graph can be broken down into the following two corresponding to the average degree of input and output (since the network is directional). We see that both have similar long-tail distributions, their mean degree being the same of 11.9
Its main difference is that the graph corresponding to the average degree of input has a median of 7 while the output is 9, which means that there is a majority of nodes with lower degrees in the input than the output.
The value of the average grade with weights is 346.07 which indicates the average of total trips in and out of each station.
The network density of 0.047 is considered a low density indicating that the network is dispersed, that is, it contains few interactions between different stations in relation to the possible ones. This is considered logical because connections between stations will be limited to certain areas due to the difficulty of reaching stations that are located at long distances.
The average clustering coefficient is 0.208 meaning that the interaction of two stations with a third does not necessarily imply interaction with each other, that is, it does not necessarily imply transitivity, so the probability of interconnection of these two stations through the intervention of a third is low.
Finally, the network has 24 related components, of which 2 are weak related components and 22 are strong related components.
5.6 Centrality analysis
A centrality analysis refers to the assessment of the importance of nodes in a network using different measures. Centrality is a fundamental concept in network analysis and is used to identify key or influential nodes within a network. To perform this task, you start from the metrics calculated in the statistics window.
- The degree centrality measure indicates that the higher the degree of a node, the more important it is. The five stations with the highest values are: 1º Plaza de la Cebada, 2º Plaza de Lavapiés, 3º Fernando el Católico, 4º Quevedo, 5º Segovia 45.
- The closeness centrality indicates that the higher the proximity value of a node, the more central it is, since it can reach any other node in the network with the least possible effort. The five stations with the highest values are: 1º Fernando el Católico 2º General Pardiñas, 3º Plaza de la Cebada, 4º Plaza de Lavapiés, 5º Puerta de Madrid.
- The measure of betweenness centrality indicates that the greater the intermediation measure of a node, the more important it is since it is present in more interaction paths between nodes than the rest of the nodes in the network. The five stations with the highest values are: 1º Fernando el Católico, 2º Plaza de Lavapiés, 3º Plaza de la Cebada, 4º Puerta de Madrid, 5º Quevedo.
With the Gephi tool you can calculate a large number of metrics and parameters that are not reflected in this study, such as the eigenvector measure or centrality distribution "eigenvector".
5.7 Filters
Through the filtering window, we can select certain parameters that simplify the visualizations in order to show relevant information of network analysis in a clearer way visually.
Next, we will show several filtered performed:
- Range (grade) filtering, which shows nodes with a rank greater than 50, assuming 13.44% (34 nodes) and 15.41% (464 edges).
- Edge filtering (edge weight), showing edges weighing more than 100, assuming 0.7% (20 edges).
Within the filters window, there are many other filtering options on attributes, ranges, partition sizes, edges, ... with which you can try to make new visualizations to extract information from the graph. If you want to know more about the use of Gephi, you can consult the following courses and trainings about the tool.
6. Conclusions of the exercice
Once the exercise is done, we can appreciate the following conclusions:
- The three stations most interconnected with other stations are Plaza de la Cebada (133), Plaza de Lavapiés (126) and Fernando el Católico (114).
- The station that has the highest number of input connections is Plaza de la Cebada (78), while the one with the highest number of exit connections is Plaza de Lavapiés with the same number as Fernando el Católico (57).
- The three stations with the highest number of total trips are Plaza de la Cebada (4524), Plaza de Lavapiés (4237) and Fernando el Católico (3526).
- There are 20 routes with more than 100 trips. Being the 3 routes with a greater number of them: Puerta de Toledo – Plaza Conde Suchil (141), Quintana Fuente del Berro – Quintana (137), Camino Vinateros – Miguel Moya (134).
- Taking into account the number of connections between stations and trips, the most important stations within the network are: Plaza la Cebada, Plaza de Lavapiés and Fernando el Católico.
We hope that this step-by-step visualization has been useful for learning some very common techniques in the treatment and representation of open data. We will be back to show you further reuses. See you soon!
Open data is a valuable tool for making informed decisions that encourage the success of a process and enhance its effectiveness. From a sectorial perspective, open data provides relevant information about the legal, educational, or health sectors. All of these, along with many other areas, utilize open sources to measure improvement compliance or develop tools that streamline work for professionals.
The benefits of using open data are extensive, and their variety goes hand in hand with technological innovation: every day, more opportunities arise to employ open data in the development of innovative solutions. An example of this can be seen in urban development aligned with the sustainability values advocated by the United Nations (UN).
Cities cover only 3% of the Earth's surface; however, they emit 70% of carbon emissions and consume over 60% of the world's resources, according to the UN. In 2023, more than half of the global population lives in cities, and this figure is projected to keep growing. By 2030, it is estimated that over 5 billion people would live in cities, meaning more than 60% of the world's population.
Despite this trend, infrastructures and neighborhoods do not meet the appropriate conditions for sustainable development, and the goal is to "Make cities and human settlements inclusive, safe, resilient, and sustainable," as recognized in Sustainable Development Goal (SDG) number 11. Proper planning and management of urban resources are significant factors in creating and maintaining sustainability-based communities. In this context, open data plays a crucial role in measuring compliance with this SDG and thus achieving the goal of sustainable cities.
In conclusion, open data stands as a fundamental tool for the strengthening and progress of sustainable city development.
In this infographic, we have gathered use cases that utilize sets of open data to monitor and/or enhance energy efficiency, transportation and urban mobility, air quality, and noise levels. Issues that contribute to the proper functioning of urban centers.
Click on the infographic to view it in full size.
