Application

  MOVACTIVA is a digital platform developed by the Department of Geography of the Universitat Autònoma de Barcelona, which works as an interactive atlas focused on active mobility, i.e. the transport of people using non-motorised means, such as walking or cycling. The atlas collects information from five Spanish cities: Barcelona, Granada, Madrid, Palma de Mallorca and Valencia.

The project maps five urban indicators that are decisive for active mobility, based on 57 georeferenced variables:

The combination of these five elements makes it possible to create an objective and standardised indicator: the Global Active Mobility Indicator. 

In addition, the website also offers information on:

  • Micromobility, which includes electric, small and light modes of transport (Personal Mobility Vehicles or PMVs), such as electric bicycles and scooters, hoverboards, segways and monowheels.
  • Intermodality, which involves the use of two or more modes of transport within the framework of a single trip.

To bring information closer to users, it has an interactive viewer that allows geographic data to be explored visually, facilitating comparison between cities and promoting a healthier and more sustainable urban approach. The indicators are superimposed on open access base maps such as the PNOA orthophotography (from the IGN) and OpenStreetMap.

calendar icon
Blog

Sport has always been characterized by generating a lot of data, statistics, graphs... But accumulating figures is not enough. It is necessary to analyze the data, draw conclusions and make decisions based on it. The advantages of sharing data in this sector go beyond mere sports, having a positive impact on health and the economic sphere. they go beyond mere sports, having a positive impact on health and the economic sphere.

Artificial intelligence (AI) has also reached the professional sports sector and its ability to process huge amounts of data has opened the door to making the most of the potential of all that information. Manchester City, one of the best-known football clubs in the British Premier League, was one of the pioneers in using artificial intelligence to improve its sporting performance:  it uses AI algorithms for the selection of new talent and has collaborated in the development of WaitTime, an artificial intelligence platform that manages the attendance of crowds in large sports and leisure venues. In Spain, Real Madrid, for example, incorporated the use of artificial intelligence a few years ago and promotes forums on the impact of AI on sport.

Artificial intelligence systems analyze extensive volumes of data collected during training and competitions, and are able to provide detailed evaluations on the effectiveness of strategies and optimization opportunities. In addition, it is possible to develop alerts on injury risks, allowing prevention measures to be established, or to create personalized training plans that are automatically adapted to each athlete according to their individual needs. These tools have completely changed contemporary high-level sports preparation. In this post we are going to review some of these use cases.

From simple observation to complete data management to optimize results

Traditional methods of sports evaluation have evolved into highly specialized technological systems. Artificial intelligence and machine learning tools process massive volumes of information during training and competitions, converting statistics, biometric data and audiovisual content into  strategic insights for the management of athletes' preparation and health.

Real-time performance analysis systems  are one of the most established implementations in the sports sector. To collect this data, it is common to see athletes training with bands or vests that monitor different parameters in real time. Both these and other devices and sensors record movements, speeds and biometric data. Heart rate, speed or acceleration are some of the most common data. AI algorithms process this information, generating immediate results that help optimize personalized training programs for each athlete and tactical adaptations, identifying patterns to locate areas for improvement.

In this sense, sports artificial intelligence platforms evaluate both individual performance and collective dynamics in the case of team sports. To evaluate the tactical area, different types of data are analyzed according to the sports modality. In endurance disciplines, speed, distance, rhythm or power are examined, while in team sports data on the position of the players or the accuracy of passes or shots are especially relevant.

Another advance is AI cameras, which allow you to follow the trajectory of players on the field and the movements of different elements, such as the ball in ball sports. These systems generate a multitude of data on positions, movements and patterns of play. The analysis of these historical data sets allows us to identify strategic strengths and vulnerabilities both our own and those of our opponents. This helps to generate different tactical options and improve decision-making before a competition.

Health and well-being of athletes

Sports injury prevention systems analyze historical data and metrics in real-time. Its algorithms identify injury risk patterns, allowing personalized preventive measures to be taken for each athlete. In the case of football, teams such as Manchester United, Liverpool, Valencia CF and Getafe CF have been implementing these technologies for several years.

In addition to the data we have seen above, sports monitoring platforms also record physiological variables continuously: heart rate, sleep patterns, muscle fatigue and movement biomechanics. Wearable devices  with artificial intelligence capabilities detect indicators of fatigue, imbalances, or physical stress that precede injuries. With this data, the algorithms predict patterns that detect risks and make it easier to act preventively, adjusting training or developing specific recovery programs before an injury occurs. In this way, training loads, rep volume, intensity and recovery periods can be calibrated according to individual profiles. This predictive maintenance for athletes is especially relevant for teams and clubs in which athletes are not only sporting assets, but also economic ones. In addition, these systems also optimise sports rehabilitation processes, reducing recovery times in muscle injuries by up to 30% and providing predictions on the risk of relapse.

While not foolproof, the data indicates that these platforms predict approximately 50% of injuries during sports seasons, although they cannot predict when they will occur. The application of AI to healthcare in sport thus contributes to the extension of professional sports careers, facilitating optimal performance and the athlete's athletic well-being in the long term.

Improving the audience experience

Artificial intelligence is also revolutionizing the way fans enjoy sport, both in stadiums and at home. Thanks to natural language processing (NLP) systems, viewers can follow comments and subtitles in real time, facilitating access for people with hearing impairments or speakers of other languages. Manchester City has recently incorporated this technology for the generation of real-time subtitles on the screens of its stadium. These applications have also reached other sports disciplines: IBM Watson has developed a functionality that allows Wimbledon fans to watch the videos with highlighted commentary and AI-generated subtitles.

In addition, AI optimises the management of large capacities through sensors and predictive algorithms, speeding up access, improving security and customising services such as seat locations. Even in broadcasts, AI-powered tools offer instant statistics,  automated highlights, and smart cameras that follow the action without human intervention, making the experience more immersive and dynamic. The NBA uses Second Spectrum, a system that combines cameras with AI to analyze player movements and create visualizations, such as passing routes or shot probabilities. Other sports, such as golf or Formula 1, also use similar tools that enhance the fan experience.

Data privacy and other challenges

The application of AI in sport also poses significant ethical challenges. The collection and analysis of biometric information raises doubts about the security and protection of athletes' personal data, so it is necessary to establish protocols that guarantee the management of consent, as well as the ownership of such data.

Equity  is another concern, as the application of artificial intelligence gives competitive advantages to teams and organizations with greater economic resources, which can contribute to perpetuating inequalities.

Despite these challenges, artificial intelligence has radically transformed the professional sports landscape. The future of sport seems to be linked to the evolution of this technology. Its application promises to continue to elevate athlete performance and the public experience, although some challenges need to be overcome.

calendar icon
Sectores
calendar icon
Noticia

Promoting the data culture is a key objective at the national level that is also shared by the regional administrations. One of the ways to achieve this purpose is to award those solutions that have been developed with open datasets, an initiative that enhances their reuse and impact on society.

On this mission, the Junta de Castilla y León and the Basque Government have been organising open data competitions for years, a subject we talked about in our first episode of the datos.gob.es podcast that you can listen to here.

In this post, we take a look at the winning projects in the latest editions of the open data competitions in the Basque Country and Castilla y León.

Winners of the 8th Castile and Leon Open Data Competition

In the eighth edition of this annual competition, which usually opens at the end of summer, 35 entries were submitted, from which 8 winners were chosen in different categories.

Ideas category: participants had to describe an idea to create studies, services, websites or applications for mobile devices. A first prize of 1,500€ and a second prize of 500€ were awarded.

  • First prize: Green Guardians of Castilla y León presented by Sergio José Ruiz Sainz. This is a proposal to develop a mobile application to guide visitors to the natural parks of Castilla y León. Users can access information (such as interactive maps with points of interest) as well as contribute useful data from their visit, which enriches the application.
  • Second prize: ParkNature: intelligent parking management system in natural spaces presented by Víctor Manuel Gutiérrez Martín. It consists of an idea to create an application that optimises the experience of visitors to the natural areas of Castilla y León, by integrating real-time data on parking and connecting with nearby cultural and tourist events.

Products and Services Category: Awarded studies, services, websites or applications for mobile devices, which must be accessible to all citizens via the web through a URL. In this category, first, second and third prizes of €2,500, €1,500 and €500 respectively were awarded, as well as a specific prize of €1,500 for students.

  • First prize: AquaCyL from Pablo Varela Vázquez. It is an application that provides information about the bathing areas in the autonomous community.
  • Second prize: ConquistaCyL presented by Markel Juaristi Mendarozketa and Maite del Corte Sanz. It is an interactive game designed for tourism in Castilla y León and learning through a gamified process.
  • Third prize: All the sport of Castilla y León presented by Laura Folgado Galache. It is an app that presents all the information of interest associated with a sport according to the province.
  • Student prizeOtto Wunderlich en Segovia by Jorge Martín Arévalo. It is a photographic repository sorted according to type of monuments and location of Otto Wunderlich's photographs.

Didactic Resource Category: consisted of the creation of new and innovative open didactic resources to support classroom teaching. These resources were to be published under Creative Commons licences. A single first prize of €1,500 was awarded in this category.

  • First prize: StartUp CyL: Business creation through Artificial Intelligence and Open Data presented by José María Pérez Ramos. It is a chatbot that uses the ChatGPT API to assist in setting up a business using open data.

Data Journalism category: awarded for published or updated (in a relevant way) journalistic pieces, both in written and audiovisual media, and offered a prize of €1,500.

Winners of the 5th edition of the Open Data Euskadi Open Data Competition

As in previous editions, the Basque open data portal opened two prize categories: an ideas competition and an applications competition, each of which was divided into several categories. On this occasion, 41 applications were submitted for the ideas competition and 30 for the applications competition.

Idea competition: In this category, two prizes of €3,000 and €1,500 have been awarded in each category.

Health and Social Category

Category Environment and Sustainability

  • First prize: Baratzapp by Leire Zubizarreta Barrenetxea. The idea consists of the development of a software that facilitates and assists in the planning of a vegetable garden by means of algorithms that seek to enhance the knowledge related to the self-consumption vegetable garden, while integrating, among others, climatological, environmental and plot information in a personalised way for the user.
  • Second prize: Euskal Advice by Javier Carpintero Ordoñez. The aim of this proposal is to define a tourism recommender based on artificial intelligence.

General Category

  • First prize: Lanbila by Hodei Gonçalves Barkaiztegi. It is a proposed app that uses generative AI and open data to match curriculum vitae with job offers in a semantic way.. It provides personalised recommendations, proactive employment and training alerts, and enables informed decisions through labour and territorial indicators.
  • Second prize: Development of an LLM for the interactive consultation of Open Data of the Basque Government by Ibai Alberdi Martín. The proposal consists in the development of a Large Scale Language Model (LLM) similar to ChatGPT, specifically trained with open data, focused on providing a conversational and graphical interface that allows users to get accurate answers and dynamic visualisations.

Applications competition: this modality has selected one project in the web services category, awarded with €8,000, and two more in the General Category, which have received a first prize of €8,000 and a second prize of €5,000.

Category Web Services

General Category

  • First prize: Garbiñe AI by Beatriz Arenal Redondo. It is an intelligent assistant that combines Artificial Intelligence (AI) with open data from Open Data Euskadi to promote the circular economy and improve recycling rates in the Basque Country.
  • Second prize: Vitoria-Gasteiz Businessmap by Zaira Gil Ozaeta. It is an interactive visualisation tool based on open data, designed to improve strategic decisions in the field of entrepreneurship and economic activity in Vitoria-Gasteiz.

All these award-winning solutions reuse open datasets from the regional portal of Castilla y León or Euskadi, as the case may be. We encourage you to take a look at the proposals that may inspire you to participate in the next edition of these competitions. Follow us on social media so you don't miss out on this year's calls!

calendar icon
Blog

In the digital age, data has become an invaluable tool in almost all areas of society, and the world of sport is no exception. The availability of data related to this field can have a positive impact on the promotion of health and wellbeing, as well as on the improvement of physical performance of both athletes and citizens in general. Moreover, its benefits are also evident in the economic sphere, as this data can be used to publicise the sports offer or to generate new services, among other things.

Here are three examples of their impact.

Promoting an active and healthy lifestyle

The availability of public information can inspire citizens to participate in physical activity and sport, both by providing examples of its health benefits and by facilitating access to opportunities that suit their individual preferences and needs.

An example of the possibilities of open data in this field can be found in the project OpenActive. It is an initiative launched in 2016 by the Open Data Institute (ODI) together with Sport England, a public body aimed at encouraging physical activity for everyone in England. OpenActive allows sport providers to publish standardised open data, based on a standard developed by the ODI to ensure quality, interoperability and reliability. These data have enabled the development of tools to facilitate the search for and booking of activities, thus helping to combat citizens' physical inactivity. According to an external impact assessment, this project could have helped prevent up to 110 premature deaths, saved up to £3 million in healthcare costs and generated up to £20 million in productivity gains per year. In addition, it has had a major economic impact for the operators participating that share their data by increasing their customers and thus their profits.

Optimisation of physical work

The data provides teams, coaches and athletes with access to a wealth of information about competitions and their performance, allowing them to conduct detailed analysis and find ways to improve. This includes data on game statistics, health, etc.

In this respect, the French National Agency for Sport, together with the National Institute for Sport, Experience and Performance (INSEP) and the General Directorate for Sport, have developed the Sport Data Hub - FFS. The project was born in 2020 with the idea of boosting the individual and collective performance of French sport in the run-up to the Olympic Games in Pays 2024. It consists of the creation of a collaborative tool for all those involved in the sports movement (federations, athletes, coaches, technical teams, institutions and researchers) to share data that allows for aggregate comparative analysis at national and international level.

Research to understand the impact of data in areas such as health and the economy

Data related to physical activity can also be used in scientific research to analyse the effects of exercise on health and help prevent injury or disease. They can also help us to understand the economic impact of sporting activities.

As an example, in 2021 the European Commission launched the report Mapping of sport statistics and data in the EUwith data on the economic and social impact of sport, both at EU and member state level, between 2012 and 2021. The study identifies available data sources and collects quantitative and qualitative data. These data are used to compile a series of indicators of the impact of sport on the economy and society, including an entire section focusing on health.

This type of study can be used by public bodies to develop policies to promote these activities and to provide citizens with sport-related services and resources adapted to their specific needs. A measure that could help to prevent diseases and thus save on health costs.

Where to find open data related to sport?

In order to carry out these projects, reliable data sources are needed. At datos.gob.es you can find various datasets on this topic. Most of them have been published by local administrations and refer to sports facilities and equipment, as well as events of this nature.

Within the National Catalogue, DEPORTEData stands out. It is a database of the Ministry of Education, Vocational Training and Sport for the storage and dissemination of statistical results in the field of sport. Through their website they offer magnitudes structured in two large blocks:

  •  Cross-sectional estimates on employment and enterprises, expenditure by households and public administration, education, foreign trade and tourism, all linked to sport.
  •  Sector-specific information, including indicators on federated sport, coach training, doping control, sporting habits, facilities and venues, as well as university and school championships.

At the European level, we can visit the European Open Data Portal (data.europa.eu), with more than 40,000 datasets on sport, or Eurostat. And if we want to take a closer look at citizens' behaviour, we can go to the Eurobarometer on sport and physical activity, whose data can also be found on data.europa.eu. Similarly, at the global level, the World Health Organisation provides datasets on the effects of physical inactivity.

In conclusion, there is a need to promote the openness of quality, up-to-date and reliable data on sport. Information with a great impact not only for society, but also for the economy, and can help us improve the way we participate, compete and enjoy sport.

calendar icon
Documentación

1. Introduction

Visualizations are graphical representations of data that allow the information linked to them to be communicated in a simple and effective way. The visualization possibilities are very wide, from basic representations, such as line, bar or sector graphs, to visualizations configured on interactive dashboards. 

In this "Step-by-Step Visualizations" section we are regularly presenting practical exercises of open data visualizations available in datos.gob.es or other similar catalogs. They address and describe in a simple way the stages necessary to obtain the data, perform the transformations and analyses that are relevant to, finally, enable the creation of interactive visualizations that allow us to obtain final conclusions as a summary of said information. In each of these practical exercises, simple and well-documented code developments are used, as well as tools that are free to use. All generated material is available for reuse in the GitHub Data Lab repository.

Then, and as a complement to the explanation that you will find below, you can access the code that we will use in the exercise and that we will explain and develop in the following sections of this post.

Access the data lab repository on Github.

Run the data pre-processing code on top of Google Colab.

Back to top

 

2. Objetive

The main objective of this exercise is to show how to perform a network or graph analysis based on open data on rental bicycle trips in the city of Madrid. To do this, we will perform a preprocessing of the data in order to obtain the tables that we will use next in the visualization generating tool, with which we will create the visualizations of the graph.

Network analysis are methods and tools for the study and interpretation of the relationships and connections between entities or interconnected nodes of a network, these entities being persons, sites, products, or organizations, among others. Network analysis seeks to discover patterns, identify communities, analyze influence, and determine the importance of nodes within the network. This is achieved by using specific algorithms and techniques to extract meaningful insights from network data.

Once the data has been analyzed using this visualization, we can answer questions such as the following: 

  • What is the network station with the highest inbound and outbound traffic? 
  • What are the most common interstation routes?
  • What is the average number of connections between stations for each of them?
  • What are the most interconnected stations within the network?

Back to top

 

3. Resources

3.1. Datasets

The open datasets used contain information on loan bike trips made in the city of Madrid. The information they provide is about the station of origin and destination, the time of the journey, the duration of the journey, the identifier of the bicycle, ...

These open datasets are published by the Madrid City Council, through files that collect the records on a monthly basis.

These datasets are also available for download from the following Github repository.

 

3.2. Tools

To carry out the data preprocessing tasks, the Python programming language written on a Jupyter Notebook hosted in the Google Colab cloud service has been used.

"Google Colab" or, also called Google Colaboratory, is a cloud service from Google Research that allows you to program, execute and share code written in Python or R on a Jupyter Notebook from your browser, so it does not require configuration. This service is free of charge.

For the creation of the interactive visualization, the Gephi tool has been used.

"Gephi" is a network visualization and analysis tool. It allows you to represent and explore relationships between elements, such as nodes and links, in order to understand the structure and patterns of the network. The program requires download and is free.

If you want to know more about tools that can help you in the treatment and visualization of data, you can use the report "Data processing and visualization tools".

Back to top

 

4. Data processing or preparation

The processes that we describe below you will find them commented in the Notebook that you can also run from Google Colab.

Due to the high volume of trips recorded in the datasets, we defined the following starting points when analysing them:

  • We will analyse the time of day with the highest travel traffic
  • We will analyse the stations with a higher volume of trips

Before launching to analyse and build an effective visualization, we must carry out a prior treatment of the data, paying special attention to its obtaining and the validation of its content, making sure that they are in the appropriate and consistent format for processing and that they do not contain errors.

As a first step of the process, it is necessary to perform an exploratory analysis of the data (EDA), in order to properly interpret the starting data, detect anomalies, missing data or errors that could affect the quality of subsequent processes and results. If you want to know more about this process you can resort to the Practical Guide of Introduction to Exploratory Data Analysis

The next step is to generate the pre-processed data table that we will use to feed the network analysis tool (Gephi) that will visually help us understand the information. To do this, we will modify, filter and join the data according to our needs.

The steps followed in this data preprocessing, explained in this Google Colab Notebook, are as follows:

  1. Installation of libraries and loading of datasets
  2. Exploratory Data Analysis (EDA)
  3. Generating pre-processed tables

You will be able to reproduce this analysis with the source code that is available in our GitHub account. The way to provide the code is through a document made on a Jupyter Notebook that, once loaded into the development environment, you can easily run or modify.

Due to the informative nature of this post and to favour the understanding of non-specialized readers, the code is not intended to be the most efficient but to facilitate its understanding, so you will possibly come up with many ways to optimize the proposed code to achieve similar purposes. We encourage you to do so!

Back to top

 

5. Network analysis

5.1. Definition of the network

The analysed network is formed by the trips between different bicycle stations in the city of Madrid, having as main information of each of the registered trips the station of origin (called "source") and the destination station (called "target").

The network consists of 253 nodes (stations) and 3012 edges (interactions between stations). It is a directed graph, because the interactions are bidirectional and weighted, because each edge between the nodes has an associated numerical value called "weight" which in this case corresponds to the number of trips made between both stations.

5.2. Loading the pre-processed table in to Gephi

Using the "import spreadsheet" option on the file tab, we import the previously pre-processed data table in CSV format. Gephi will detect what type of data is being loaded, so we will use the default predefined parameters.

 

Figure 1. Uploading data to Gephi
 
 

5.3. Network display options

5.3.1 Distribution window

First, we apply in the distribution window, the Force Atlas 2 algorithm. This algorithm uses the technique of node repulsion depending on the degree of connection in such a way that the sparsely connected nodes are separated from those with a greater force of attraction to each other.

To prevent the related components from being out of the main view, we set the value of the parameter "Severity in Tuning" to a value of 10 and to avoid that the nodes are piled up, we check the option "Dissuade Hubs" and "Avoid overlap".

 

Figure 2. Distribution window - Force Atlas 2 overlap
 

Dentro de la ventana de distribución, también aplicamos el algoritmo de Expansión con la finalidad de que los nodos no se encuentren tan juntos entre sí mismos.

Figure 3. Distribution window - Expansion algorithm

5.3.2 Appearance window

Next, in the appearance window, we modify the nodes and their labels so that their size is not equal but depends on the value of the degree of each node (nodes with a higher degree, larger visual size). We will also modify the colour of the nodes so that the larger ones are a more striking colour than the smaller ones. In the same appearance window we modify the edges, in this case we have opted for a unitary colour for all of them, since by default the size is according to the weight of each of them.

A higher degree in one of the nodes implies a greater number of stations connected to that node, while a greater weight of the edges implies a greater number of trips for each connection.

Figure 4. Appearance window

5.3.3 Graph window

Finally, in the lower area of the interface of the graph window, we have several options such as activating / deactivating the button to show the labels of the different nodes, adapting the size of the edges in order to make the visualization cleaner, modify the font of the labels, ...

Figure 5. Options graph window
 

Next, we can see the visualization of the graph that represents the network once the visualization options mentioned in the previous points have been applied.

Figure 6. Graph display

 

Activating the option to display labels and placing the cursor on one of the nodes, the links that correspond to the node and the rest of the nodes that are linked to the chosen one through these links will be displayed.

Next, we can visualize the nodes and links related to the bicycle station "Fernando el Católico". In the visualization, the nodes that have a greater number of connections are easily distinguished, since they appear with a larger size and more striking colours, such as "Plaza de la Cebada" or "Quevedo".

Figure 7. Graph display for station "Fernando el Católico"
 

5.4 Main network measures

Together with the visualization of the graph, the following measurements provide us with the main information of the analysed network. These averages, which are the usual metrics when performing network analytics, can be calculated in the statistics window.

Figure 8. Statistics window

 

  • Nodes (N): are the different individual elements that make up a network, representing different entities. In this case the different bicycle stations. Its value on the network is 243
  • Links (L): are the connections that exist between the nodes of a network. Links represent the relationships or interactions between the individual elements (nodes) that make up the network. Its value in the network is 3014
  • Maximum number of links (Lmax): is the maximum possible number of links in the network. It is calculated by the following formula Lmax= N(N-1)/2. Its value on the network is 31878
  • Average grade (k): is a statistical measure to quantify the average connectivity of network nodes. It is calculated by averaging the degrees of all nodes in the network. Its value in the network is 23.8
  • Network density ​(d): indicates the proportion of connections between network nodes to the total number of possible connections. Its value in the network is 0.047
  • Diámetro (dmax ): is the longest graph distance between any two nodes of the res, i.e., how far away the 2 nodes are farther apart. Its value on the network is 7
  • Mean distance ​(d):is the average mean graph distance between the nodes of the network. Its value on the network is 2.68
  • Mean clustering coefficient ​(C): Indicates how nodes are embedded between their neighbouring nodes. The average value gives a general indication of the grouping in the network. Its value in the network is 0.208
  • Related component​: A group of nodes that are directly or indirectly connected to each other but are not connected to nodes outside that group. Its value on the network is 24

 

5.5 Interpretation of results

The probability of degrees roughly follows a long-tail distribution, where we can observe that there are a few stations that interact with a large number of them while most interact with a low number of stations.

The average grade is 23.8 which indicates that each station interacts on average with about 24 other stations (input and output).

In the following graph we can see that, although we have nodes with degrees considered as high (80, 90, 100, ...), it is observed that 25% of the nodes have degrees equal to or less than 8, while 75% of the nodes have degrees less than or equal to 32.

Figure 9. Grade Allocation Chart
 

The previous graph can be broken down into the following two corresponding to the average degree of input and output (since the network is directional). We see that both have similar long-tail distributions, their mean degree being the same of 11.9

Its main difference is that the graph corresponding to the average degree of input has a median of 7 while the output is 9, which means that there is a majority of nodes with lower degrees in the input than the output.

Figure 10. Graphs distribution of degrees of input and output
 
 
 

The value of the average grade with weights is 346.07 which indicates the average of total trips in and out of each station.

Figure 11. Graph distribution of degrees with weights
 

The network density of 0.047 is considered a low density indicating that the network is dispersed, that is, it contains few interactions between different stations in relation to the possible ones. This is considered logical because connections between stations will be limited to certain areas due to the difficulty of reaching stations that are located at long distances.

The average clustering coefficient is 0.208 meaning that the interaction of two stations with a third does not necessarily imply interaction with each other, that is, it does not necessarily imply transitivity, so the probability of interconnection of these two stations through the intervention of a third is low.

Finally, the network has 24 related components, of which 2 are weak related components and 22 are strong related components.

 

5.6 Centrality analysis

A centrality analysis refers to the assessment of the importance of nodes in a network using different measures. Centrality is a fundamental concept in network analysis and is used to identify key or influential nodes within a network. To perform this task, you start from the metrics calculated in the statistics window.

  • The degree centrality measure indicates that the higher the degree of a node, the more important it is. The five stations with the highest values are: 1º Plaza de la Cebada, 2º Plaza de Lavapiés, 3º Fernando el Católico, 4º Quevedo, 5º Segovia 45.

Figure 12. Graph visualization degree centrality
 
  • The closeness centrality indicates that the higher the proximity value of a node, the more central it is, since it can reach any other node in the network with the least possible effort. The five stations with the highest values are: 1º Fernando el Católico 2º General Pardiñas, 3º Plaza de la Cebada, 4º Plaza de Lavapiés, 5º Puerta de Madrid.

Figure 13. Measured closeness centrality distribution

 

Figure 14. Graphic visualization closeness centrality
 
  • The measure of betweenness centrality indicates that the greater the intermediation measure of a node, the more important it is since it is present in more interaction paths between nodes than the rest of the nodes in the network. The five stations with the highest values are: 1º Fernando el Católico, 2º Plaza de Lavapiés, 3º Plaza de la Cebada, 4º Puerta de Madrid, 5º Quevedo.

Figure 15. Measured betweenness centrality distribution
 
FIgure 16. Graphic visualization betweenness centrality
 

With the Gephi tool you can calculate a large number of metrics and parameters that are not reflected in this study, such as the eigenvector measure or centrality distribution "eigenvector".

 

5.7 Filters

Through the filtering window, we can select certain parameters that simplify the visualizations in order to show relevant information of network analysis in a clearer way visually.

Figure 17. Filtering windows

Next, we will show several filtered performed:

  • Range (grade) filtering, which shows nodes with a rank greater than 50, assuming 13.44% (34 nodes) and 15.41% (464 edges).

Figure 18. Graph display filtered range (degree)
 
  • Edge filtering (edge weight), showing edges weighing more than 100, assuming 0.7% (20 edges).

Figure 19. Visualization graph edge filtering (weight)

 

Within the filters window, there are many other filtering options on attributes, ranges, partition sizes, edges, ... with which you can try to make new visualizations to extract information from the graph. If you want to know more about the use of Gephi, you can consult the following courses and trainings about the tool.

Back to top

 

6. Conclusions of the exercice

Once the exercise is done, we can appreciate the following conclusions:

  • The three stations most interconnected with other stations are Plaza de la Cebada (133), Plaza de Lavapiés (126) and Fernando el Católico (114).
  • The station that has the highest number of input connections is Plaza de la Cebada (78), while the one with the highest number of exit connections is Plaza de Lavapiés with the same number as Fernando el Católico (57).
  • The three stations with the highest number of total trips are Plaza de la Cebada (4524), Plaza de Lavapiés (4237) and Fernando el Católico (3526).
  • There are 20 routes with more than 100 trips. Being the 3 routes with a greater number of them: Puerta de Toledo – Plaza Conde Suchil (141), Quintana Fuente del Berro – Quintana (137), Camino Vinateros – Miguel Moya (134).
  • Taking into account the number of connections between stations and trips, the most important stations within the network are: Plaza la Cebada, Plaza de Lavapiés and Fernando el Católico.

We hope that this step-by-step visualization has been useful for learning some very common techniques in the treatment and representation of open data. We will be back to show you further reuses. See you soon!

Back to top

calendar icon