TL;DR
This paper introduces a fast, model-based clustering method for partially recorded data using a finite mixture of multivariate t distributions, which is more efficient and often more accurate than traditional imputation or case deletion methods.
Contribution
It develops a novel clustering algorithm that directly models observed data with a mixture of t distributions, avoiding imputation and full EM, and demonstrates superior performance in simulations and real data.
Findings
The method is computationally more efficient than imputation and full EM.
It achieves better cluster recovery than case deletion and imputation.
It performs comparably or better than full EM even when MAR assumptions are violated.
Abstract
Partially recorded data are frequently encountered in many applications and usually clustered by first removing incomplete cases or features with missing values, or by imputing missing values, followed by application of a clustering algorithm to the resulting altered dataset. Here, we develop clustering methodology through a model-based approach using the marginal density for the observed values, assuming a finite mixture model of multivariate distributions. We compare our approximate algorithm to the corresponding full expectation-maximization (EM) approach that considers the missing values in the incomplete data set and makes a missing at random (MAR) assumption, as well as case deletion and imputation methods. Since only the observed values are utilized, our approach is computationally more efficient than imputation or full EM. Simulation studies demonstrate that our approach has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
