Can Strategic Data Collection Improve the Performance of Poverty Prediction Models?
Satej Soman, Emily Aiken, Esther Rolf, and Joshua Blumenstock

TL;DR
This study investigates whether adaptive sampling strategies can enhance machine learning models for poverty prediction, finding that traditional uniform sampling performs as well as more complex active learning methods.
Contribution
The paper provides a systematic comparison of sampling strategies for training poverty prediction models, revealing that active learning does not outperform uniform sampling in this context.
Findings
Active learning methods did not improve model performance over uniform sampling.
Simulations showed no significant advantage of adaptive sampling strategies.
Results suggest re-evaluating data collection approaches for poverty estimation.
Abstract
Machine learning-based estimates of poverty and wealth are increasingly being used to guide the targeting of humanitarian aid and the allocation of social assistance. However, the ground truth labels used to train these models are typically borrowed from existing surveys that were designed to produce national statistics -- not to train machine learning models. Here, we test whether adaptive sampling strategies for ground truth data collection can improve the performance of poverty prediction models. Through simulations, we compare the status quo sampling strategies (uniform at random and stratified random sampling) to alternatives that prioritize acquiring training data based on model uncertainty or model performance on sub-populations. Perhaps surprisingly, we find that none of these active learning methods improve over uniform-at-random sampling. We discuss how these results can help…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 epidemiological studies · Income, Poverty, and Inequality · Child Nutrition and Water Access
MethodsNone · Test
