Simple KNN-Based Outlier Detection Achieves Robust Clustering

Tianle Jiang; Yufa Zhou

arXiv:2605.07130·cs.LG·May 11, 2026

Simple KNN-Based Outlier Detection Achieves Robust Clustering

Tianle Jiang, Yufa Zhou

PDF

TL;DR

This paper demonstrates that a simple KNN-based heuristic effectively addresses robust clustering by removing outliers, matching or surpassing more complex methods in real-world datasets.

Contribution

It proves that removing points with large KNN distances achieves competitive approximation guarantees for robust k-Means under practical assumptions.

Findings

01

KNN-based outlier removal achieves constant-factor approximation.

02

Empirical results show superior or comparable performance to complex algorithms.

03

Simple heuristics can effectively bridge outlier detection and robust clustering.

Abstract

Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust$ k $-Means}$ problem (i.e., $k$ -Means with outliers), the goal is to remove $z$ outliers and minimize the $k$ -Means cost on the remaining points. Despite the close connection between robust $k$ -Means and outlier detection, both theoretical and empirical understanding of the effectiveness of $classic outlier detection heuristics$ for robust $k$ -Means remains limited. In this paper, we prove that under a practical assumption on the optimal cluster sizes, simply removing points with large $K$ -Nearest-Neighbor distances achieves performance comparable to prior work in terms of approximation guarantees: it yields a constant-factor reduction from robust $k$ -Means to standard $k$ -Means, without introducing additional centers or discarding extra outliers, as is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.