Nonparametric Statistical Inference and Imputation for Incomplete Categorical Data
Chaojie Wang, Linghao Shen, Han Li, Xiaodan Fan

TL;DR
This paper introduces a nonparametric Bayesian method, DPMCPM, for modeling and imputing incomplete categorical data, effectively capturing complex associations and missing mechanisms, outperforming existing methods in inference and imputation.
Contribution
The paper proposes DPMCPM, a flexible Bayesian model that jointly handles missing data in categorical variables using an infinite mixture of product-multinomials, improving inference and imputation.
Findings
DPMCPM outperforms existing methods in simulations.
It effectively models complex associations among variables.
The method handles various missing data mechanisms.
Abstract
Missingness in categorical data is a common problem in various real applications. Traditional approaches either utilize only the complete observations or impute the missing data by some ad hoc methods rather than the true conditional distribution of the missing data, thus losing or distorting the rich information in the partial observations. In this paper, we propose the Dirichlet Process Mixture of Collapsed Product-Multinomials (DPMCPM) to model the full data jointly and compute the model efficiently. By fitting an infinite mixture of product-multinomial distributions, DPMCPM is applicable for any categorical data regardless of the true distribution, which may contain complex association among variables. Under the framework of latent class analysis, we show that DPMCPM can model general missing mechanisms by creating an extra category to denote missingness, which implicitly integrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference · Statistical Methods and Inference
