Parsimonious mixtures of multivariate contaminated normal distributions
Antonio Punzo, Paul D. McNicholas

TL;DR
This paper introduces a flexible mixture model for clustering that accounts for outliers without prior parameter specification, using eigen-decomposition for parsimony and providing an estimation algorithm, with demonstrated effectiveness on simulated and real data.
Contribution
It develops a novel contaminated normal mixture model with automatic outlier proportion and contamination degree parameters, enhancing robustness and flexibility in clustering.
Findings
Model effectively identifies outliers and clusters in simulated data.
Comparison shows improved robustness over traditional mixtures.
Successful application to artificial and real datasets.
Abstract
A mixture of multivariate contaminated normal distributions is developed for model-based clustering. In addition to the parameters of the classical normal mixture, our contaminated mixture has, for each cluster, a parameter controlling the proportion of mild outliers and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach. Parsimony is introduced via eigen-decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. Using a large scale simulation study, the behaviour of the proposed approach is investigated and comparison with well-established finite mixtures is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
