Simultaneous Edit and Imputation for Household Data with Structural Zeros
Olanrewaju Akande, Andr\'es Barrientos, Jerome P. Reiter

TL;DR
This paper introduces a Bayesian hierarchical model for simultaneous editing and imputation of household data, effectively handling errors and missing values while preserving data relationships, demonstrated on the 2012 American Community Survey.
Contribution
It develops a novel model-based framework that jointly addresses data errors and missingness in household surveys, ensuring data consistency and integrity.
Findings
Successfully imputes missing values and corrects errors in household data.
Generates plausible datasets satisfying all edit constraints.
Preserves multivariate relationships within households.
Abstract
Multivariate categorical data nested within households often include reported values that fail edit constraints---for example, a participating household reports a child's age as older than his biological parent's age---as well as missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
