The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It

Roy R. Lederman; David Silva-S\'anchez; Ziling Chen; Gilles Mordant; Amnon Balanov; Tamir Bendory

arXiv:2602.09936·stat.ML·February 11, 2026

The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It

Roy R. Lederman, David Silva-S\'anchez, Ziling Chen, Gilles Mordant, Amnon Balanov, Tamir Bendory

PDF

Open Access

TL;DR

This paper demonstrates that Lloyd's k-means algorithm fails catastrophically in high-dimensional noisy data, often returning the initial partition, while Hartigan's algorithm avoids this issue, explaining empirical difficulties with k-means.

Contribution

The paper provides a theoretical analysis showing Lloyd's k-means fails in high dimensions, unlike Hartigan's algorithm, highlighting the importance of algorithm choice in high-dimensional clustering.

Findings

01

Lloyd's k-means often returns initial partitions in high dimensions

02

Hartigan's k-means avoids the catastrophic failure in high-dimensional settings

03

Theoretical explanation for empirical difficulties of k-means in high dimensions

Abstract

Lloyd's k-means algorithm is one of the most widely used clustering methods. We prove that in high-dimensional, high-noise settings, the algorithm exhibits catastrophic failure: with high probability, essentially every partition of the data is a fixed point. Consequently, Lloyd's algorithm simply returns its initial partition - even when the underlying clusters are trivially recoverable by other methods. In contrast, we prove that Hartigan's k-means algorithm does not exhibit this pathology. Our results show the stark difference between these algorithms and offer a theoretical explanation for the empirical difficulties often observed with k-means in high dimensions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Distributed systems and fault tolerance · Stochastic Gradient Optimization Techniques