Simple KNN-Based Outlier Detection Achieves Robust Clustering
Tianle Jiang, Yufa Zhou

TL;DR
This paper demonstrates that a simple KNN-based heuristic effectively addresses robust clustering by removing outliers, matching or surpassing more complex methods in real-world datasets.
Contribution
It proves that removing points with large KNN distances achieves competitive approximation guarantees for robust k-Means under practical assumptions.
Findings
KNN-based outlier removal achieves constant-factor approximation.
Empirical results show superior or comparable performance to complex algorithms.
Simple heuristics can effectively bridge outlier detection and robust clustering.
Abstract
Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the \textit{robust k-Means} problem (i.e., -Means with outliers), the goal is to remove outliers and minimize the -Means cost on the remaining points. Despite the close connection between robust -Means and outlier detection, both theoretical and empirical understanding of the effectiveness of for robust -Means remains limited. In this paper, we prove that under a practical assumption on the optimal cluster sizes, simply removing points with large -Nearest-Neighbor distances achieves performance comparable to prior work in terms of approximation guarantees: it yields a constant-factor reduction from robust -Means to standard -Means, without introducing additional centers or discarding extra outliers, as is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
