Multiple Scaled Contaminated Normal Distribution and Its Application in Clustering
Antonio Punzo, Cristina Tortora

TL;DR
This paper introduces the multiple scaled contaminated normal (MSCN) distribution, a flexible model for robust multivariate clustering that accounts for dimension-specific outliers and improves detection and estimation in contaminated data.
Contribution
It proposes the MSCN distribution with dimension-specific contamination parameters and an EM algorithm extension for robust clustering, advancing outlier detection and parameter estimation.
Findings
MSCN effectively detects bad points in each dimension.
The model improves clustering robustness over existing heavy-tailed distributions.
Simulation and real data demonstrate superior performance in contaminated scenarios.
Abstract
The multivariate contaminated normal (MCN) distribution represents a simple heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the presence of mild outliers, referred to as "bad" points. The MCN can also automatically detect bad points. The price of these advantages is two additional parameters, both with specific and useful interpretations: proportion of good observations and degree of contamination. However, points may be bad in some dimensions but good in others. The use of an overall proportion of good observations and of an overall degree of contamination is limiting. To overcome this limitation, we propose a multiple scaled contaminated normal (MSCN) distribution with a proportion of good observations and a degree of contamination for each dimension. Once the model is fitted, each observation has a posterior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
