Small Area Estimation with Random Forests and the LASSO
Victoire Michal, Jon Wakefield, Alexandra M. Schmidt, Alicia, Cavanaugh, Brian Robinson, Jill Baumgartner

TL;DR
This paper compares random forests, LASSO, and other methods for small area estimation using auxiliary data, proposing a new uncertainty measure, with application to Ghanaian household consumption data.
Contribution
It introduces a modified conformal procedure for uncertainty quantification in random forests and LASSO within small area estimation, applied to real survey data.
Findings
Bayesian shrinkage method had the best bias and MSE performance.
Significant variation in household consumption across areas was observed.
The proposed uncertainty measure effectively captures estimate variability.
Abstract
We consider random forests and LASSO methods for model-based small area estimation when the number of areas with sampled data is a small fraction of the total areas for which estimates are required. Abundant auxiliary information is available for the sampled areas, from the survey, and for all areas, from an exterior source, and the goal is to use auxiliary variables to predict the outcome of interest. We compare areal-level random forests and LASSO approaches to a frequentist forward variable selection approach and a Bayesian shrinkage method. Further, to measure the uncertainty of estimates obtained from random forests and the LASSO, we propose a modification of the split conformal procedure that relaxes the assumption of identically distributed data. This work is motivated by Ghanaian data available from the sixth Living Standard Survey (GLSS) and the 2010 Population and Housing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsdemographic modeling and climate adaptation · Land Use and Ecosystem Services · Spatial and Panel Data Analysis
