Fast model-based clustering of partial records

Emily M. Goren; Ranjan Maitra

arXiv:2103.16336·stat.ME·October 20, 2021

Fast model-based clustering of partial records

Emily M. Goren, Ranjan Maitra

PDF

1 Repo

TL;DR

This paper introduces a fast, model-based clustering method for partially recorded data using a finite mixture of multivariate t distributions, which is more efficient and often more accurate than traditional imputation or case deletion methods.

Contribution

It develops a novel clustering algorithm that directly models observed data with a mixture of t distributions, avoiding imputation and full EM, and demonstrates superior performance in simulations and real data.

Findings

01

The method is computationally more efficient than imputation and full EM.

02

It achieves better cluster recovery than case deletion and imputation.

03

It performs comparably or better than full EM even when MAR assumptions are violated.

Abstract

Partially recorded data are frequently encountered in many applications and usually clustered by first removing incomplete cases or features with missing values, or by imputing missing values, followed by application of a clustering algorithm to the resulting altered dataset. Here, we develop clustering methodology through a model-based approach using the marginal density for the observed values, assuming a finite mixture model of multivariate $t$ distributions. We compare our approximate algorithm to the corresponding full expectation-maximization (EM) approach that considers the missing values in the incomplete data set and makes a missing at random (MAR) assumption, as well as case deletion and imputation methods. Since only the observed values are utilized, our approach is computationally more efficient than imputation or full EM. Simulation studies demonstrate that our approach has…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emilygoren/MixtClust
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.