An effective variant of the Hartigan $k$-means algorithm

Fran\c{c}ois Cl\'ement; Stefan Steinerberger

arXiv:2604.21798·cs.LG·April 24, 2026

An effective variant of the Hartigan $k$-means algorithm

Fran\c{c}ois Cl\'ement, Stefan Steinerberger

PDF

TL;DR

This paper introduces a minor variation of Hartigan's k-means algorithm that consistently improves clustering results by an additional 2-5%, especially in higher dimensions or with larger k.

Contribution

It proposes a simple modification to Hartigan's algorithm that yields further improvements over existing methods in clustering quality.

Findings

01

Hartigan's algorithm outperforms Lloyd's in most cases.

02

A minor variation of Hartigan's method improves results by 2-5%.

03

Improvements are more significant with higher dimensions or larger k.

Abstract

The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5%$ -- $10%$ . We point out that a very minor variation of Hartigan's method leads to another $2%$ -- $5%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.