Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model
Antonio Punzo, Paul D. McNicholas

TL;DR
This paper introduces a robust contaminated Gaussian cluster-weighted model for regression analysis that effectively handles outliers and leverage points without prior parameter specification, enhancing clustering robustness.
Contribution
The paper proposes a novel contaminated Gaussian CWM with flexible outlier and leverage point control parameters that do not require pre-specification, improving robustness in clustering and regression.
Findings
The model effectively identifies outliers and leverage points within clusters.
Monte Carlo experiments demonstrate improved estimator robustness over Gaussian CWM.
Application to real data shows practical utility and enhanced classification accuracy.
Abstract
The Gaussian cluster-weighted model (CWM) is a mixture of regression models with random covariates that allows for flexible clustering of a random vector composed of response variables and covariates. In each mixture component, it adopts a Gaussian distribution for both the covariates and the responses given the covariates. To robustify the approach with respect to possible elliptical heavy tailed departures from normality, due to the presence of atypical observations, the contaminated Gaussian CWM is here introduced. In addition to the parameters of the Gaussian CWM, each mixture component of our contaminated CWM has a parameter controlling the proportion of outliers, one controlling the proportion of leverage points, one specifying the degree of contamination with respect to the response variables, and one specifying the degree of contamination with respect to the covariates.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
