Quantifying the value of data

Fecha de la noticia: 26-09-2022

What is the Value of Data? A review of empirical methods Diane Coyle and Annabel Manley1 POLICY BRIEF July 2022

There is a recurring question that has been around since the beginning of the open data movement, and as efforts and investments in data collection and publication have increased, it has resonated more and more strongly: What is the value of a dataset?

This is an extremely difficult question to answer given not only the inherent complexity of the data itself, which grows exponentially as we begin to combine it, but also the different points of view from which the question of value can be approached.

  • If we know that the value will not be immediate, how can we foresee and quantify the potential benefits at some point in the future?
  •  Could the value of data become negative in some cases, if we can also cause some kind of 'harm' with it?
  • Can the value of data degrade over time?

In this space we have recurrently analysed the value of open data for the administration from different approaches: high-value data and its identification, the perspective of suppliers, the keys to the value of data, how to generate value through data or what is the value of data in real time. However, analysis and research work in this area continues to grow unstoppably. In this regard, we would like to highlight a recently published paper from the University of Cambridge. It is a study in which some of the most common methods for data valuation are described.

Based on their previous analysis of the characteristics of the data and its associated value, a review of the methods that currently exist has been carried out. They concluded that these methods can be divided into several categories, the characteristics of which are detailed below.

Methods based on cost análisis

This approach is based on the traditional statistical principle of "sum of costs". It takes into account the costs of generating, collecting, storing and replacing datasets, as well as the costs to the organisation in case the data results in some kind of loss. These methods have the advantage that they are relatively easy to calculate, but on the other hand, they have the difficulty of having to differentiate between costs directly attributable to the data and other indirect costs related to, for example, the variety of professional work involved or the different software elements used.

An example of the application of this method is the case of Statistics Canada with its analysis of the valuation of the costs associated with investment in data, databases and data science in Canada.

Methods based on revenue análisis

In this case, revenue stream expectations are used, taking as a reference the existing potential market for the exploitation of the data. This may take into account, for example, usage fees, trademarks or patents. The main limitations of these methods are generally that they require the application of somewhat more subjective criteria and the complexity of estimating the value when the data are not exploited directly but indirectly, e.g. through analytics.

These methods are used in the OECD study on the prospects for the value of data. It calculates the reported revenues related to the collection and sale of data through the US enterprise survey.

Methods based on market análisis

Generally, these are the preferred methods to use when all the elements necessary to make the calculations are available. However, today there is still a large amount of data in organisations for internal use only, which makes it difficult to use these methods, as their behaviour is not visible to the market. Furthermore, these methods cannot fully incorporate the social value of the data.

An example of this method is the analysis made in the study carried out by the Economic Commission for Latin America and the Caribbean (ECLAC) on the data marts launched by the European Union and the Government of Colombia, respectively.

Experiments and surveys

This approach to the value of data consists of assessing market sentiment in relation to the data by directly asking about the willingness to pay for certain data or to do without it. It is generally used when the public market value is not known or in cases where social value is important, for example in the environmental area. A limitation of these methods is that, when respondents are not specialists, it can be quite difficult for them to assess the possible uses of the data and thus its full value.

The study carried out by the UK Office for National Statistics is a clear example of such valuation methods.

Impact-based methods

In this case the assessment is carried out through experiments or case studies that analyse the causal effect on certain outcomes attributable to the data. This option is particularly useful for evidence-led policy makers, as it allows a cause-effect relationship to be established, making it easier to understand the benefits and to develop a narrative in favour of the use of the data. However, if the experiments are not well designed or are not well adjusted to the specific context we want to analyse, we run the risk of obtaining an excessively subjective assessment.

The decision-based evaluation framework proposed by the Internet of Water Coalition is a good example of how to apply impact-focused methods to a particular case.

Actor-chain methods

The aim of these methods is to use a more comprehensive view to assess the data from different points of view. This means that evaluations can also be more complex by involving different definitions of what constitutes the value of the data. However, it also makes it the most appropriate method when one wants to assess a data ecosystem as a whole. Moreover, it is a growing method for organisations considering socially responsible investment.

An example of how these methods can be applied in practice is the case study carried out with Highways England.

Methods based on real options analysis

The main advantage of these methods is that they can be applied even when not all possible use cases for the data are yet defined. Their aim is to get an estimate of the value of the data in certain possible future scenarios - usually through computer simulation - so that if such a scenario is reached, exploitation of the data could be justified. Thus, certain data-related decisions and investments could be postponed until the ideal scenario that maximises the value of the data is reached, thereby minimising the associated costs and risks until that point.

The UK case study on the transport sector provides an example of how these methods could be applied using financial models.

And what is the method I should use in my particular case?

Unfortunately, there is no golden rule for selecting a particular method. However, there are a number of questions that the authors of the study suggest we ask ourselves in order to find the most appropriate method for each case:

  • What exactly we are assessing: Data goes through several stages in its life cycle - from raw data to processed data, analysis or generated knowledge. Depending on which phase we want to focus our analysis on, some methods may be more appropriate than others.
  • From which point of view the valuation is carried out: value can have different definitions depending on the point of view of who is carrying out or commissioning a valuation. In some cases, for example, cost containment due to budgetary constraints may be the priority, while in others one might choose to try to maximise social value.
  • When the assessment process takes place: basically it should be considered whether the assessment will be carried out in a predictive way before all the elements assessed are available or whether it will be carried out a posteriori, once all the variables are already known.
  • What is the purpose of the assessment: several of the available methods omit or minimise certain aspects of the data by focusing on other features of the assessment process. Therefore, it will be necessary to be clear about the priorities of our evaluation when selecting the most appropriate method: are we interested in social impact, improving productivity or maximising the cost-effectiveness of data?

We should therefore first analyse our needs and our own definition of value, asking ourselves what exactly we want to evaluate and how best to carry out that evaluation, and then develop our own valuation framework using the most appropriate methods from the wide variety available.


Content prepared by Carlos Iglesias, Open data Researcher and consultant, World Wide Web Foundation.

The contents and views expressed in this publication are the sole responsibility of the author.