Accelerating k-Means Clustering with Cover Trees

Andreas Lang; Erich Schubert

arXiv:2410.15117·cs.LG·October 22, 2024

Accelerating k-Means Clustering with Cover Trees

Andreas Lang, Erich Schubert

PDF

TL;DR

This paper introduces a novel k-means clustering acceleration method using cover trees, which outperforms previous approaches based on k-d trees by combining tree-based aggregation with bounds filtering.

Contribution

The paper presents a new k-means algorithm leveraging cover trees, offering improved performance over existing methods that use k-d trees, especially across wider parameter ranges.

Findings

01

Cover tree-based k-means outperforms k-d tree methods.

02

Hybrid approach combining cover trees with bounds filtering enhances efficiency.

03

Method reduces distance computations in clustering.

Abstract

The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster distances and the triangle inequality to reduce the number of distance computations, with only arrays as underlying data structures. These approaches cannot exploit that nearby points are likely assigned to the same cluster. We propose a new k-means algorithm based on the cover tree index, that has relatively low overhead and performs well, for a wider parameter range, than previous approaches based on the k-d tree. By combining this with upper and lower bounds, as in state-of-the-art approaches, we obtain a hybrid algorithm that combines the benefits of tree aggregation and bounds-based filtering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodsk-Means Clustering