
TL;DR
This paper evaluates the effectiveness of econometric and machine learning models, especially random forests, in predicting poverty with incomplete income data, highlighting their relative performance across various missing data scenarios.
Contribution
It systematically compares classic econometric and machine learning models for poverty prediction under different missing data patterns, revealing the strengths of random forests.
Findings
Random forests outperform other models in most scenarios.
Model accuracy varies with missing data patterns.
Complete data leads to better poverty prediction accuracy.
Abstract
Poverty prediction models are used to address missing data issues in a variety of contexts such as poverty profiling, targeting with proxy-means tests, cross-survey imputations such as poverty mapping, top and bottom incomes studies, or vulnerability analyses. Based on the models used by this literature, this paper conducts a study by artificially corrupting data clear of missing incomes with different patterns and shares of missing incomes. It then compares the capacity of classic econometric and machine learning models to predict poverty under different scenarios with full information on observed and unobserved incomes, and the true counterfactual poverty rate. Random forest provides more consistent and accurate predictions under most but not all scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
