Sparse $K$-spatial-median clustering for high-dimensional data
Ping Zhao, Dan Zhuang, Long Feng

TL;DR
This paper introduces a robust high-dimensional clustering method using spatial medians, feature exclusion, and adaptive metrics, outperforming traditional approaches in accuracy and stability.
Contribution
It develops a novel sparse $K$-spatial-median clustering framework that enhances robustness and scalability for high-dimensional data with irrelevant features.
Findings
The proposed method achieves competitive accuracy in simulations.
It improves stability over existing $K$-means methods.
Automatic feature exclusion enhances performance in high-dimensional settings.
Abstract
We propose a robust clustering framework for high-dimensional data with heavy tails and a large fraction of irrelevant variables. The method replaces the mean updates of Lloyd's -means with \emph{spatial medians} to enhance robustness. For the assignment step, it admits either a Euclidean rule for computational simplicity or a robust Mahalanobis-type metric constructed from the spatial sign covariance matrix to account for heterogeneous scales and feature dependence. To handle the regime, we further introduce a simple \emph{hard feature-exclusion} mechanism that removes weakly separating dimensions based on across-center dispersion, with the exclusion threshold selected automatically via a permutation-based Gap criterion. Simulation studies under correlated Gaussian and multivariate models demonstrate that the proposed approach provides competitive clustering accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
