Robust and sparse k-means clustering for high-dimensional data
Sarka Brodinova, Peter Filzmoser, Thomas Ortner, Christian, Breiteneder, Maia Zaharieva

TL;DR
This paper introduces a robust, sparse k-means clustering method that effectively identifies groups, outliers, and relevant variables in high-dimensional data without prior knowledge, using a weighted and penalized approach.
Contribution
It proposes a novel k-means-based algorithm with automatic observation weighting and a lasso-type penalty to handle noise and outliers in high-dimensional clustering.
Findings
Successfully identifies groups, outliers, and informative variables in simulated data.
Outperforms existing methods in real-world datasets.
Provides a framework for selecting the number of clusters and variables.
Abstract
In real-world application scenarios, the identification of groups poses a significant challenge due to possibly occurring outliers and existing noise variables. Therefore, there is a need for a clustering method which is capable of revealing the group structure in data containing both outliers and noise variables without any pre-knowledge. In this paper, we propose a -means-based algorithm incorporating a weighting function which leads to an automatic weight assignment for each observation. In order to cope with noise variables, a lasso-type penalty is used in an objective function adjusted by observation weights. We finally introduce a framework for selecting both the number of clusters and variables based on a modified gap statistic. The conducted experiments on simulated and real-world data demonstrate the advantage of the method to identify groups, outliers, and informative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
