On Data Thinning for Model Validation in Small Area Estimation
Sho Kawano, Paul A. Parker, Zehang Richard Li

TL;DR
This paper introduces a data thinning validation method for small area estimation models that enables out-of-sample validation using only area-level survey estimates, addressing a key challenge in model validation.
Contribution
It proposes a novel validation scheme based on data thinning for the Fay-Herriot model, with theoretical analysis and practical guidelines for balancing bias and variance.
Findings
Thinning-based validation provides consistent performance across diverse sampling designs.
The method reveals a bias-variance tradeoff influenced by thinning parameters.
Simulations with American Community Survey data demonstrate the approach's effectiveness.
Abstract
Small area estimation (SAE) produces estimates of population parameters for geographic and demographic subgroups with limited sample sizes. Such estimates are critical for informing policy decisions, ranging from poverty mapping to social program funding. Despite its widespread use, principled validation of SAE models remains challenging and general guidelines are far from well-established. Unlike conventional predictive modeling settings, validation data are rarely available in the SAE context. External validation surveys or censuses often do not exist, and access to individual-level microdata is often restricted, making standard cross-validation infeasible. In this paper, we propose a novel model validation scheme using only area-level direct survey estimates under the widely used Fay-Herriot model. Our approach is based on data thinning, which splits area-level observations into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
