# Faster K-Means Cluster Estimation

**Authors:** Siddhesh Khandelwal, Amit Awekar

arXiv: 1701.04600 · 2017-01-18

## TL;DR

This paper introduces a heuristic to accelerate K-means clustering by predicting candidate clusters for data points, significantly reducing computation time with minimal impact on accuracy.

## Contribution

The paper proposes a novel heuristic that predicts potential cluster memberships, reducing distance computations in K-means and improving speed.

## Key findings

- Achieves up to 3x speed-up on synthetic and real datasets.
- Maintains comparable mean squared error to standard K-means.
- Effective across various datasets and K-means variants.

## Abstract

There has been considerable work on improving popular clustering algorithm `K-means' in terms of mean squared error (MSE) and speed, both. However, most of the k-means variants tend to compute distance of each data point to each cluster centroid for every iteration. We propose a fast heuristic to overcome this bottleneck with only marginal increase in MSE. We observe that across all iterations of K-means, a data point changes its membership only among a small subset of clusters. Our heuristic predicts such clusters for each data point by looking at nearby clusters after the first iteration of k-means. We augment well known variants of k-means with our heuristic to demonstrate effectiveness of our heuristic. For various synthetic and real-world datasets, our heuristic achieves speed-up of up-to 3 times when compared to efficient variants of k-means.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.04600/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/1701.04600/full.md

---
Source: https://tomesphere.com/paper/1701.04600