Cluster analysis and outlier detection with missing data
Hung Tong, Cristina Tortora

TL;DR
This paper introduces a new framework for fitting mixture of multivariate contaminated normal distributions to incomplete data sets with missing values, using an expectation-conditional maximization algorithm, and compares it with mixtures of Student's t distributions through simulations.
Contribution
It develops a novel method for applying contaminated normal mixture models to incomplete data, addressing a key limitation of existing models.
Findings
The proposed method effectively handles missing data in clustering.
Simulation results show competitive performance with Student's t mixture models.
The framework improves robustness in the presence of outliers and missingness.
Abstract
A mixture of multivariate contaminated normal (MCN) distributions is a useful model-based clustering technique to accommodate data sets with mild outliers. However, this model only works when fitted to complete data sets, which is often not the case in real applications. In this paper, we develop a framework for fitting a mixture of MCN distributions to incomplete data sets, i.e. data sets with some values missing at random. We employ the expectation-conditional maximization algorithm for parameter estimation. We use a simulation study to compare the results of our model and a mixture of Student's t distributions for incomplete data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Distribution Estimation and Applications · Advanced Statistical Methods and Models
