Adaptively Robust and Sparse K-means Clustering
Hao Li, Shonosuke Sugasawa, Shota Katayama

TL;DR
This paper introduces ARSK, a novel clustering method that enhances K-means by improving robustness to outliers and selecting informative variables in high-dimensional data, using penalized error components and weights.
Contribution
It proposes a new adaptive robust and sparse K-means algorithm with penalized error components and weights, optimized via Gap statistics, to handle outliers and noisy variables simultaneously.
Findings
ARSK outperforms existing algorithms in simulations.
It effectively identifies clusters without outliers.
It selects informative variables in high-dimensional data.
Abstract
While K-means is known to be a standard clustering algorithm, its performance may be compromised due to the presence of outliers and high-dimensional noisy variables. This paper proposes adaptively robust and sparse K-means clustering (ARSK) to address these practical limitations of the standard K-means algorithm. For robustness, we introduce a redundant error component for each observation, and this additional parameter is penalized using a group sparse penalty. To accommodate the impact of high-dimensional noisy variables, the objective function is modified by incorporating weights and implementing a penalty to control the sparsity of the weight vector. The tuning parameters to control the robustness and sparsity are selected by Gap statistics. Through simulation experiments and real data analysis, we demonstrate the proposed method's superiority to existing algorithms in identifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition
Methodsk-Means Clustering
