Sample-split REGression SREG: A robust estimator for high-dimensional survey data
Yonghyun Kwon, Shu Yang, Jae Kwang Kim

TL;DR
This paper introduces a novel sample-split regression estimator (SREG) for survey data that reduces bias caused by high-dimensional auxiliary variables, using cross-fitting to improve accuracy.
Contribution
The paper proposes a new SREG estimator that eliminates high-dimensional bias in survey regression, without requiring root-n consistent estimation of regression coefficients.
Findings
SREG removes bias in high-dimensional survey regression models.
SREG maintains efficiency comparable to existing estimators.
Theoretical results establish asymptotic normality and variance consistency.
Abstract
Model-assisted regression estimation is fundamental in survey sampling for incorporating auxiliary information. However, when the auxiliary dimension grows with the sample size, the standard Generalized regression (GREG) estimator can exhibit non-negligible bias under informative sampling, even when the working model is correctly specified. This failure stems from the double use of sampled outcomes simultaneously for fitting the regression and for forming the residual correction. We propose a sample-split REGression (SREG) estimator based on K-fold cross-fitting that eliminates this bias by pairing each unit's residual with an out-of-fold prediction. The resulting estimator is first-order equivalent to the oracle difference estimator under a weak prediction-norm consistency requirement, without requiring root-n consistent estimation of regression coefficients. We establish asymptotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
