Clustering data with values missing at random using scale mixtures of multivariate skew-normal distributions
Jason Pillay, Cristina Tortora, Antonio Punzo, Andriette Bekker

TL;DR
This paper develops a flexible clustering method using scale mixtures of multivariate skew-normal distributions that effectively handles missing data under a missing at random mechanism, capturing skewness and heavy tails.
Contribution
It extends the FMSMSN family to incomplete data, deriving properties and an EM algorithm, enabling robust clustering with skewed, heavy-tailed data and missing values.
Findings
Demonstrates improved clustering performance with missing data
Provides closed-form expressions for missing data imputation
Shows applicability to real-world CO2 emissions data
Abstract
Handling missing data is a major challenge in model-based clustering, especially when the data exhibit skewness and heavy tails. We address this by extending the finite mixture of scale mixtures of multivariate skew-normal (FMSMSN) family to accommodate incomplete data under a missing at random (MAR) mechanism. Unlike previous work that is limited to one of the special cases of the FMSMSN family, our method offers a cluster analysis methodology for the entire family that accounts for skewness and excess kurtosis amidst data with missing values. The multivariate skew-normal distribution, as parameterised by \cite{azzalini1996} and \cite{arnoldbeaver} includes the normal distribution as a special case, which ensures that our method is flexible toward existing symmetric model-based clustering techniques under a normality assumption. We derive the distributional properties of the missing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
