Fast k-means algorithm clustering

Raied Salman; Vojislav Kecman; Qi Li; Robert Strack; Erik Test

arXiv:1108.1351·cs.DS·August 8, 2011

Fast k-means algorithm clustering

Raied Salman, Vojislav Kecman, Qi Li, Robert Strack, Erik Test

PDF

TL;DR

This paper introduces a two-stage k-means clustering algorithm that significantly reduces computation time for large datasets by using a small subset of data for initial center estimation, then refining with the full dataset.

Contribution

A novel two-stage approach for k-means clustering that accelerates processing of large datasets by combining fast initial estimation with precise refinement.

Findings

01

Achieves 1-9 times speed-up on large datasets

02

Effective initial center estimation reduces total computation time

03

Maintains clustering accuracy comparable to standard k-means

Abstract

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of the dataset is large (for example more than 500millions of points). We propose a two stage algorithm to reduce the time cost of distance calculation for huge datasets. The first stage is a fast distance calculation using only a small portion of the data to produce the best possible location of the centers. The second stage is a slow distance calculation in which the initial centers used are taken from the first stage. The fast and slow stages represent the speed of the movement of the centers. In the slow stage, the whole dataset can be used to get the exact location of the centers. The time cost of the distance calculation for the fast stage is very…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.