Nearest neighbor ratio imputation with incomplete multi-nomial outcome in survey sampling
Chenyin Gao, Katherine Jenny Thompson, Shu Yang, Jae Kwang Kim

TL;DR
This paper develops a ratio-based nearest neighbor imputation method for handling incomplete multinomial survey data, providing valid variance estimation even with substantial sampling fractions, demonstrated through empirical US survey data.
Contribution
It introduces a novel ratio imputation estimator using auxiliary variables and derives a valid variance estimator accounting for large sampling fractions.
Findings
Proposed estimator performs well in simulations.
Variance estimator remains accurate with non-negligible sampling fractions.
Method effectively estimates detailed expenditure items.
Abstract
Nonresponse is a common problem in survey sampling. Appropriate treatment can be challenging, especially when dealing with detailed breakdowns of totals. Often, the nearest neighbor imputation method is used to handle such incomplete multinomial data. In this article, we investigate the nearest neighbor ratio imputation estimator, in which auxiliary variables are used to identify the closest donor and the vector of proportions from the donor is applied to the total of the recipient to implement ratio imputation. To estimate the asymptotic variance, we first treat the nearest neighbor ratio imputation as a special case of predictive matching imputation and apply the linearization method of \cite{yang2020asymptotic}. To account for the non-negligible sampling fractions, parametric and generalized additive models are employed to incorporate the smoothness of the imputation estimator, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Sampling and Estimation Techniques · Survey Methodology and Nonresponse · Statistical Methods and Bayesian Inference
