TL;DR
This paper empirically compares various multiple imputation methods for multivariate ordinal data, highlighting their relative performance and identifying the most effective approaches under different missing data scenarios.
Contribution
It provides a comprehensive evaluation of multiple imputation techniques for ordinal data, which was previously limited in research, using simulation studies based on real survey data.
Findings
Proportional odds logistic regression, classification trees, and DP mixtures outperform other methods.
Multinomial logistic regression can perform well depending on missing data mechanisms.
Certain methods are more robust across different missing data scenarios.
Abstract
Missing data remains a very common problem in large datasets, including survey and census data containing many ordinal responses, such as political polls and opinion surveys. Multiple imputation (MI) is usually the go-to approach for analyzing such incomplete datasets, and there are indeed several implementations of MI, including methods using generalized linear models, tree-based models, and Bayesian non-parametric models. However, there is limited research on the statistical performance of these methods for multivariate ordinal data. In this article, we perform an empirical evaluation of several MI methods, including MI by chained equations (MICE) using multinomial logistic regression models, MICE using proportional odds logistic regression models, MICE using classification and regression trees, MICE using random forest, MI using Dirichlet process (DP) mixtures of products of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
