We live in the age of digitalization and datafication. More and more data is being generated and we observe the world through it. Satellite images, mobile phone data, commercial transactions, environmental sensors or social networks are some examples of sources of information that serve to answer questions of public interest, in areas such as health, the environment, spatial planning or food production, among others.
New York University's Open Data Policy Lab has been documenting this phenomenon for years as part of what it calls the "Third Wave of Open Data," a movement toward purpose-driven open data that extends its reach to the private sector and focuses on responsible use of information. On its blog, the lab regularly compiles recent examples of how researchers, administrations, and international organizations are leveraging this non-traditional data in various fields.
What is non-traditional data?
An article published in the Centre for Digital Development at the University of Manchester defines non-traditional data (NTD) as data captured, mediated or observed using digital technologies and which, in many cases, are generated by private companies or technology platforms. This type of data usually arises as by-products of another daily activity or comes from the operation of digital infrastructures: a phone call, a purchase in the supermarket, a publication on a social network or the passage of a satellite over the territory. Among their characteristics are that they offer information continuously and with a high level of geographical detail.
This term is often used in contrast to traditional data, which is data that is deliberately collected using standardized methodologies and consolidated measurement processes, such as official censuses, statistical surveys, or administrative records. These data tend to have a lower periodicity, a well-defined structure, and an explicit purpose: to describe social, economic, or demographic phenomena with high levels of control and validation.
Both types of data are of great value, but their combination makes their potential multiply. Thanks to their joint analysis, rapid changes or fine patterns of social behavior can be better captured.
Below, we delve into three recent examples collected by the Open Data Policy Lab, which show how non-traditional data is being applied in very different areas, with tangible consequences for society.
Public Health: Loyalty Cards to Detect Early Signs of Cancer
One of the most unique examples of non-traditional data use in healthcare is the reuse of merchant loyalty card data to investigate whether shopping habits can anticipate a cancer diagnosis.
A research team from Imperial College London is using loyalty card data from two British supermarket chains, with the consent of about 3,000 participants, to analyse whether purchasing patterns, especially of over-the-counter medicines, change before a cancer diagnosis occurs. The researchers compared the shopping habits of people with cancer versus healthy people, which allowed them to identify subtle behavioral changes prior to diagnosis.
Previous studies had already shown that buying patterns could anticipate a diagnosis of ovarian cancer up to eight months before it was clinically confirmed. Extending this approach to other cancers could facilitate earlier detection, encouraging people to seek medical care sooner. This is a clear example of how data generated for purely commercial purposes can provide health signals that traditional systems do not capture on their own.
Mobility: Responding to SMS evacuation alerts
A study published in December 2025 used anonymized data from mobile phone networks to analyze how the population responds to wildfire evacuation alerts sent by SMS. The researchers monitored the activity of about 580,000 devices at 15-minute intervals during the February 2024 wildfires in Valparaíso, Chile. To do this, they used the changes in connection to the telephone antennas as an indicator of population movement, and compared these patterns before and after the alerts were sent. This information was combined with the records of the alerts themselves and with socioeconomic data to understand if the response varied according to the type of community.
The analysis showed that early alerts elicited clear population movement, while repeated alerts generated an increasingly weak response. It was also observed that higher-income areas responded more quickly and that displacement occurred even in areas that had not been directly alerted. This type of evidence can help design more efficient warning protocols and anticipate how the population will actually behave during an emergency.
Environment: Assessing the heat resilience of buildings with drones and street imagery
Global warming is one of the biggest concerns worldwide. A study conducted in the city of Dar es Salaam, in Tanzania's Msimbazi River Delta, used drone imagery and street-level photography to analyze which building features influence heat exposure in urban environments. The researchers combined these visual sources with surface temperature data obtained by satellite and maps of buildings. They trained an artificial intelligence model capable of automatically extracting attributes such as the material of roofs and facades, the presence of vegetation, building density or the reflectivity of surfaces, and relating them to the observed thermal patterns.
The analysis makes it possible to identify which features of the built environment contribute to reducing heat exposure, offering useful guidelines for urban design and building rehabilitation. This information is particularly relevant in cities exposed to an increasing risk of heatwaves, as it facilitates more targeted interventions aimed at protecting vulnerable populations, rather than general measures applied at the urban scale.
The potential of combining data
These three cases illustrate how non-traditional data is especially useful where conventional measurement is often slow or too aggregated, whether it's to detect early signs of illness, understand how the population actually responds to an emergency, or identify which buildings are most exposed to heat.
In these examples, as well as in others that the Open Data Policy Lab has been collecting in recent years, the greatest potential appears when these data are combined with existing reference sources, such as socioeconomic data, temperature records or maps, which allow the signals detected to be validated and correctly interpreted.
As this type of source is consolidated, so is its incorporation into public and private decision-making processes. In this context, governance issues such as who has access to data or how to ensure responsible use that protects people's privacy are becoming increasingly important. Solving these challenges will be key so that the potential of these sources can generate value safely and reliably.
These examples show that it is possible to extract really useful information from non-traditional data. Now the challenge is to build the protection frameworks necessary to do so without eroding the trust of those who generate that data every day.
Comments