Accelerating Spherical k-Means

Erich Schubert; Andreas Lang; Gloria Feher

arXiv:2107.04074·cs.LG·November 2, 2021

Accelerating Spherical k-Means

Erich Schubert, Andreas Lang, Gloria Feher

PDF

1 Repo

TL;DR

This paper adapts and applies existing Euclidean k-means acceleration techniques to spherical k-means, significantly speeding up clustering of high-dimensional, sparse data by working directly with Cosine similarities.

Contribution

It introduces a novel adaptation of Elkan and Hamerly accelerations for spherical k-means using Cosine similarity, enabling faster clustering.

Findings

01

Significant speedup in spherical k-means clustering.

02

Effective acceleration on real high-dimensional data.

03

Demonstrated practical benefits over standard methods.

Abstract

Spherical k-means is a widely used clustering algorithm for sparse and high-dimensional data such as document vectors. While several improvements and accelerations have been introduced for the original k-means algorithm, not all easily translate to the spherical variant: Many acceleration techniques, such as the algorithms of Elkan and Hamerly, rely on the triangle inequality of Euclidean distances. However, spherical k-means uses Cosine similarities instead of distances for computational efficiency. In this paper, we incorporate the Elkan and Hamerly accelerations to the spherical k-means algorithm working directly with the Cosines instead of Euclidean distances to obtain a substantial speedup and evaluate these spherical accelerations on real data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elki-project/elki
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.