Sleep pattern profiling using a finite mixture of contaminated multivariate skew-normal distributions on incomplete data
Jason Pillay, Cristina Tortora, Antonio Punzo, Andriette Bekker

TL;DR
This paper introduces a unified model-based clustering method that simultaneously manages missing data, outliers, and skewness in medical datasets, demonstrated on sleep pattern data without preprocessing.
Contribution
It proposes a contaminated multivariate skew-normal mixture model with an EM algorithm for joint clustering, outlier detection, and handling missing data in incomplete datasets.
Findings
Outperforms existing methods in accuracy and outlier detection
Successfully identifies meaningful sleep groups in incomplete data
Effective in high-dimensional and high-missingness scenarios
Abstract
Medical data often exhibit characteristics that make cluster analysis particularly challenging, such as missing values, outliers, and cluster features like skewness. Typically, such data would need to be preprocessed -- by cleaning outliers and missing values -- before clustering could be performed. However, these preliminary steps rely on objective functions different from those used in the clustering stage. In this paper, we propose a unified model-based clustering approach that simultaneously handles atypical observations, missing values, and cluster-wise skewness within a single framework. Each cluster is modelled using a contaminated multivariate skew-normal distribution -- a convenient two-component mixture of multivariate skew-normal densities -- in which one component represents the main data (the "bulk") and the other captures potential outliers. From an inferential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Sleep and related disorders
