Stop or Continue Data Collection: A Nonignorable Missing Data Approach for Continuous Variables
Thais Paiva, Jerry Reiter

TL;DR
This paper introduces a method for guiding nonresponse follow-up sampling decisions by creating completed datasets through imputation under various nonresponse assumptions, then evaluating accuracy and cost for different sample sizes.
Contribution
It presents a novel imputation approach for multivariate continuous data with nonignorable nonresponse using mixture models, aiding informed decision-making in data collection.
Findings
Effective imputation method for nonignorable nonresponse
Application to U.S. Census data demonstrates practical utility
Framework for balancing data accuracy and collection costs
Abstract
We present an approach to inform decisions about nonresponse follow-up sampling. The basic idea is (i) to create completed samples by imputing nonrespondents' data under various assumptions about the nonresponse mechanisms, (ii) take hypothetical samples of varying sizes from the completed samples, and (iii) compute and compare measures of accuracy and cost for different proposed sample sizes. As part of the methodology, we present a new approach for generating imputations for multivariate continuous data with nonignorable unit nonresponse. We fit mixtures of multivariate normal distributions to the respondents' data, and adjust the probabilities of the mixture components to generate nonrespondents' distributions with desired features. We illustrate the approaches using data from the 2007 U. S. Census of Manufactures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
