Data visualization: the best charts for representing comparisons

Blog

Data is a valuable source of knowledge for society. Public commitment to achieving data openness, public-private collaboration on data, and the development of applications with open data are actions that are part of the data economy, which seeks the innovative, ethical, and practical use of data to improve social and economic development.

It is as important to achieve public access and use of data as it is to properly convey that valuable information. To choose the best chart for each type of data, it is necessary to identify the type of variables and the relationship between them.

When comparing data, we must ensure that the variables are of the same nature, i.e., quantitative or qualitative variables, in the same unit of measurement, and that their content is comparable.

We present below different visualizations, their usage rules, and the most appropriate situations to use each type. We address a series of examples, from the simplest ones like bar charts to less well-known charts like heat maps or stacked comparisons.

Bar charts

A visualization that represents data using two axes: one that collects qualitative or time data and another that shows quantitative values. It is also used to analyze trends because one of the axes can show temporal data. If the axes are flipped, a column chart is obtained.

Best practices:

Display the axis value labels and reserve labels as tooltips for secondary data.
Use it to represent less than 10 value points. When visualizing more value points, a line chart may be more appropriate.
Clearly differentiate real data from estimates.
Combine with a line chart to show trends or averages.
Place the one with longer descriptions on the vertical axis, when no variable is temporal.

Source: El Orden Mundial https://elordenmundial.com/mapas-y-graficos/comercio-fertilizantes-mundo/

Clustered bar charts

A type of bar chart in which each data category is further divided into two or more subcategories. Therefore, the comparative scenario encompasses more factors.

Best practices

Limit the number of categories to avoid showing too much information on the chart.
Introduce a maximum of three or four subcategories within each category. In case more groupings need to be shown, the use of stacked bars or a set of charts can be considered.
Choose contrasting colors to differentiate the bars of each subcategory.

Source: RTVE https://www.rtve.es/noticias/20230126/pobreza-energetica-espana/2417050.shtml

Cumulative comparison charts

These charts display the composition of a category in a cumulative manner. In addition to providing a comparison between variables, these charts can show the segmentation of each category. They can be either stacked bar charts or cumulative area charts.

Best practices

Avoid using stacked bar charts when comparing segments of each category to each other. In that case, it is better to use multiple charts.
Limit the number of subcategories in stacked bar charts or segments in area charts.
Apply contrast in colors between categories and adhere to accessibility principles.

Source: Newtral https://www.newtral.es/medallas-espana-eurobasket/20220917/

Population pyramid

A combination of two horizontal bar charts that share a vertical axis representing the initial value and display two values that grow symmetrically on either side.

Best practices

Define a common ordering criterion such as age.
Represent the data in absolute numbers or percentages to take into account that the sum of the two values being compared represents the total.

Source: El Español https://www.elespanol.com/quincemil/articulos/actualidad/asi-es-la-alarmante-piramide-de-poblacion-de-galicia-en-2021

Radar chart

Circular visualization formed by polar axes that are used to represent measurements with categories that are part of the same theme. From each category, radial axes converge at the central point of the chart.

Good practices:

Keep numerical data within the same range of values to avoid distorting a chart.
Limit the number of categories in data series. An appropriate number could be between four and seven categories.
Group categories that are related or share a common hierarchy in one sector of the radar chart.

Source: Guía de visualización de datos para Entidades Locales https://redtransparenciayparticipacion.es/download/guia-de-visualizacion-de-datos-para-entidades-locales/

Heatmap

A graphical representation in table format that allows for the evaluation of two different dimensions differentiated by degrees of color intensity or traffic light codes.

Good practices:

Indicate the value in each cell because color is only an indicative attribute. In interactive graphics, values can be identified with a pop-up label.
Include a scheme or legend in the graphic to explain the meaning of the color scale.
Use accessible colors for everyone and with recognizable semantics such as gradients, hot-cold, or traffic light colors.
Limit or reduce the represented information as much as posible.

Source: eldiario.es https://www.eldiario.es/sociedad/clave-saturacion-primaria-ratios-mitad-medicos-asignados-1-500-pacientes_1_9879407.html

Bubble chart

A variation of the scatter plot that, in addition, represents an additional dimension through the size of the bubble. In this type of chart, it is possible to assign different colors to associate groups or separate categories. Besides being used to compare variables, the bubble chart is also useful for analyzing frequency distributions. This type of visualization is commonly found in infographics when it is not as important to know the exact data as it is to highlight the differences in the intensity of values.

Good practices:

Avoid overlapping bubbles so that the information is clear.
Display value labels whenever possible and the number of bubbles allows for it.

Source: Civio https://civio.es/el-boe-nuestro-de-cada-dia/2022/07/07/decretos-ley-desde-1996/

Word cloud

A visual graphic that displays words in varying sizes based on their frequency in a dataset. To develop this type of visualization, natural language processing (NLP) is used, which is a field of artificial intelligence that uses machine learning to interpret text and data.

Good practices:

It is recommended to use this resource in infographics where showing the exact figure is not relevant but a visual approximation is.
Try to make the length of the words similar to avoid affecting perception.
Make it easier to read by showing the words horizontally.
Present the words in a single color to maintain a neutral representation.

This graphic visualization, which we published in a step-by-step article, is a word cloud of several texts from datos.gob.es.

So far, we have explained the most common types of comparison charts, highlighting examples in media and reference sources. However, we can find more visualization models for comparing data in the Data Visualization Guide for Local Entities, which has served as a reference for creating this post and others that we will publish soon. This article is part of a series of posts on how to create different types of visualizations based on the relationship of the data and the objective of each exercise.

As the popular mantra goes, "a picture is worth a thousand words," which could be adapted to say that "a chart is worth a thousand numbers." Data visualization serves to make information understandable that, a priori, could be complex.

22/02/2023

How to choose the right chart to visualise open data

Blog

A statistical graph is a visual representation designed to contain a series of data whose objective is to highlight a specific part of the reality. However, organising a set of data in an informative way is not an easy task, especially, if we want to capture the viewer’s attention and to present the information in an accurate format.

In order to facilitate comparisons between data it is required a minimum of statistical knowledge to highlight trends, to avoid misleading visualisation and to illustrate the message to be conveyed. Therefore, depending on the type of interrelation that exists between the data that we are trying to illustrate, we must choose one type of visualisation or another. In other words, representing a numerical classification is not the same as representing the degree of correlation between the two variables.

In order to precisely choose the most appropriate graphs according to the information, we have selected down the most recommended graphs for each type of association between numerical variablesv. During the process of preparing this content, we have taken as a reference the Data Visualisation Guide for local entities recently published by the FEMP's RED de Entidades Locales por la Transparencia y Participación Ciudadana, as well as this infography prepared by the Financial Times.

Deviation

It is used to highlight numerical variations from a fixed reference point. Usually, the reference point is zero, but it can also be a target or a long-term average. In addition, this type of graph is useful to show sentiments (positive, neutral or negative). The most common charts are:

Diverging bar: A simple standard bar chart that can handle both negative and positive magnitude values.
Column chart: Divides a single value into 2 contrasting components (e.g. male/female).

Correlation

Useful to show the relationship between two or more variables. Note that unless you tell them otherwise, many readers will assume that the relationships you show them are causal. Here are some of the graphs.

Scatter plot: The standard way of showing the relationship between two continuous variables, each of which has its own axis.
Timeline: A good way to show the relationship between a quantity (columns) and a ratio (line).

Sorting

Sorting numerical variables is necessary when the position of an item in an ordered list is more important than its absolute or relative value. The following graphs can be used to highlight points of interest.

Bar chart: These types of visualisations allow ranges of values to be displayed in a simple way when they are sorted.
Dot-strip chart: The values are arranged in a strip. This layout saves space for designing ranges in multiple categories.

Distribution

This type of graph seeks to highlight a series of values within a data set and represent how often they occur. That is, they are used to show how variables are distributed over time, which helps to identify outliers and trends.

The shape itself of a distribution can be an interesting way to highlight non-uniformity or equality in the data. The most recommended visualisations to represent, for example, are age or gender distribution are as follows:

Histogram: This is the most common way of showing a statistical distribution. To develop it, it is recommended to keep a small space between the columns in order to highlight the "shape" of the data.
Box plot: Effective for visualising multiple distributions by showing the median (centre) and range of the data.
Population pyramid: Known for showing the distribution of the population by sex. In fact, it is a combination of two horizontal bar charts sharing the vertical axis.

Changes over time

Through this combination of numerical variables it is possible to emphasise changing trends. These can be short movements or extended series spanning decades or centuries. Choosing the right time period to represent is key to providing context for the reader.

Line graph: This is the standard way to show a changing time series. If the data is very irregular it can be useful to use markers to help represent data points.
Calendar heat map: Used to show temporal patterns (daily, weekly, monthly). It is necessary to be very precise with the amount of data.

Magnitude

It is useful for visualising size comparisons. These can be relative (simply being able to see bigger/larger) or absolute (requires seeing more specific differences). They usually show variables that can be counted (e.g. barrels, dollars or people), rather than a calculated rate or percentage.

Column chart: One of the most common ways to compare the size of things. The axis should always start at 0.
Marimekko chart: Ideal for showing the size and proportion of data at the same time, and as long as the data is not too complex.

Part of a whole

These types of numerical combinations are useful to show how an entity itself can be broken down into its constituent elements. For example, it is common to use part of a whole to represent the allocation of budgets or election results.

Pie chart: One of the most common charts to show partial or complete data. Keep in mind that it is not easy to accurately compare the size of different segments.
Stacked Venn: Limited to schematic representations to show interrelationships or coincidences.

Spatial

This type of graph is used when precise locations or geographic patterns in the data are more important to the reader than anything else. Some of the most commonly used are:

Choropleth map: This is the standard approach to placing data on a map.
Flow map: This is used to show movement of any kind within a single map. For example, it can be used to represent migratory movements.

By knowing the different statistical representation options, it helps to create more accurate data visualisations, which in turn to allow a more clearly conceived reality. Thus, in a context where visual information is becoming increasingly important, it is essential to develop the necessary tools so that the information contained in the data reaches the public and contributes to improving society.

05/01/2023