Debiasing Machine Learning Predictions for Causal Inference Without Additional Ground Truth Data: "One Map, Many Trials" in Satellite-Driven Poverty Analysis
Markus B. Pettersson, Connor T. Jerzak, Adel Daoud

TL;DR
This paper presents two post-hoc correction methods, LCC and Tweedie's correction, that reduce bias in satellite-based wealth predictions for causal inference without needing additional ground-truth data, enabling more accurate policy evaluations.
Contribution
The paper introduces practical, data-efficient correction techniques for predictive bias in satellite-derived wealth estimates used in causal inference, without requiring new labeled data.
Findings
Both methods significantly reduce prediction attenuation bias.
Tweedie's correction achieves nearly unbiased treatment effect estimates.
Applicable to various imputed outcomes beyond wealth mapping.
Abstract
Machine learning models trained on Earth observation data, such as satellite imagery, have demonstrated significant promise in predicting household-level wealth indices, enabling the creation of high-resolution wealth maps that can be leveraged across multiple causal trials while addressing chronic data scarcity in global development research. However, because standard training objectives prioritize overall predictive accuracy, these predictions often suffer from shrinkage toward the mean, leading to attenuated estimates of causal treatment effects and limiting their utility in policy evaluations. Existing debiasing methods, such as Prediction-Powered Inference (PPI), can handle this attenuation bias but require additional fresh ground-truth data at the downstream stage of causal inference, which restricts their applicability in data-scarce environments. We introduce and evaluate two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Causal Inference Techniques · Income, Poverty, and Inequality · Spatial and Panel Data Analysis
