# Large-scale and high-resolution analysis of food purchases and health   outcomes

**Authors:** Luca Maria Aiello, Rossano Schifanella, Daniele Quercia, Lucia Del, Prete

arXiv: 1905.00140 · 2019-05-02

## TL;DR

This study leverages large-scale digital grocery purchase data and medical records to analyze the relationship between food consumption patterns and health outcomes in London, revealing key predictors for metabolic syndrome-related diseases.

## Contribution

It provides an unprecedented, high-resolution analysis linking nutrient intake from grocery data to health outcomes, demonstrating scalable methods for health surveillance.

## Key findings

- Nutrient diversity and calorie intake strongly predict disease prevalence.
- Linear models explain 60% of diabetes prevalence variation.
- Healthy areas tend to consume less carbs and sugar, and diversify nutrients.

## Abstract

To complement traditional dietary surveys, which are costly and of limited scale, researchers have resorted to digital data to infer the impact of eating habits on people's health. However, online studies are limited in resolution: they are carried out at regional level and do not capture precisely the composition of the food consumed. We study the association between food consumption (derived from the loyalty cards of the main grocery retailer in London) and health outcomes (derived from publicly-available medical prescription records). The scale and granularity of our analysis is unprecedented: we analyze 1.6B food item purchases and 1.1B medical prescriptions for the entire city of London over the course of one year. By studying food consumption down to the level of nutrients, we show that nutrient diversity and amount of calories are the strongest predictors of the prevalence of three diseases related to what is called the "metabolic syndrome": hypertension, high cholesterol, and diabetes. This syndrome is a cluster of symptoms generally associated with obesity, is common across the rich world, and affects one in four adults in the UK. Our linear regression models achieve an R2 of 0.6 when estimating the prevalence of diabetes in nearly 1000 census areas in London, and a classifier can identify (un)healthy areas with up to 91% accuracy. Interestingly, healthy areas are not necessarily well-off (income matters less than what one would expect) and have distinctive features: they tend to systematically eat less carbohydrates and sugar, diversify nutrients, and avoid large quantities. More generally, our study shows that analytics of digital records of grocery purchases can be used as a cheap and scalable tool for health surveillance and, upon these records, different stakeholders from governments to insurance companies to food companies could implement effective prevention strategies.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.00140/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1905.00140/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/1905.00140/full.md

---
Source: https://tomesphere.com/paper/1905.00140