Sparse $K$-spatial-median clustering for high-dimensional data

Ping Zhao; Dan Zhuang; Long Feng

arXiv:2605.00598·stat.ME·May 4, 2026

Sparse $K$-spatial-median clustering for high-dimensional data

Ping Zhao, Dan Zhuang, Long Feng

PDF

TL;DR

This paper introduces a robust high-dimensional clustering method using spatial medians, feature exclusion, and adaptive metrics, outperforming traditional approaches in accuracy and stability.

Contribution

It develops a novel sparse $K$-spatial-median clustering framework that enhances robustness and scalability for high-dimensional data with irrelevant features.

Findings

01

The proposed method achieves competitive accuracy in simulations.

02

It improves stability over existing $K$-means methods.

03

Automatic feature exclusion enhances performance in high-dimensional settings.

Abstract

We propose a robust clustering framework for high-dimensional data with heavy tails and a large fraction of irrelevant variables. The method replaces the mean updates of Lloyd's $K$ -means with \emph{spatial medians} to enhance robustness. For the assignment step, it admits either a Euclidean rule for computational simplicity or a robust Mahalanobis-type metric constructed from the spatial sign covariance matrix to account for heterogeneous scales and feature dependence. To handle the $p ≫ n$ regime, we further introduce a simple \emph{hard feature-exclusion} mechanism that removes weakly separating dimensions based on across-center dispersion, with the exclusion threshold selected automatically via a permutation-based Gap criterion. Simulation studies under correlated Gaussian and multivariate $t$ models demonstrate that the proposed approach provides competitive clustering accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.