On cross-validation for small area estimators

Qianyu Dong; Zehang Richard Li

arXiv:2604.23464·stat.ME·May 12, 2026

On cross-validation for small area estimators

Qianyu Dong, Zehang Richard Li

PDF

TL;DR

This paper introduces a new cross-validation framework for evaluating small area estimators in public health surveys, addressing challenges of model comparison and uncertainty quantification.

Contribution

It develops a model-agnostic, theoretically grounded cross-validation method that improves comparison of SAE models with complex survey designs.

Findings

01

Conventional leave-one-area-out CV can mislead model rankings.

02

The proposed framework provides more robust model comparison.

03

Simulation studies validate the effectiveness of the new approach.

Abstract

Subnational monitoring of public health often relies on household surveys where data are sparse at the desired spatial resolution. Small area estimation (SAE) methods address this challenge by borrowing strength across areas and incorporating auxiliary information. However, comparing these estimators remains difficult in the absence of ground truth. We propose a cross-validation framework for evaluating small area estimators that accommodates complex survey designs. Our approach enables model-agnostic comparisons between area-level and unit-level SAE models. Central to our framework is a decomposition of the cross-validated squared error, which reveals both identifiable bias and unidentifiable components that can be bounded. Our theoretical results and simulation studies show that conventional approaches, such as leave-one-area-out cross-validation, can yield misleading model rankings,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.