A Faster $k$-means++ Algorithm

Jiehao Liang; Somdeb Sarkhel; Zhao Song; Chenbo Yin; Junze Yin,; Danyang Zhuo

arXiv:2211.15118·cs.DS·February 15, 2024

A Faster $k$-means++ Algorithm

Jiehao Liang, Somdeb Sarkhel, Zhao Song, Chenbo Yin, Junze Yin,, Danyang Zhuo

PDF

Open Access

TL;DR

This paper introduces FastKmeans++, a new algorithm that significantly reduces the running time for initializing k-means clustering, making it nearly optimal for large datasets.

Contribution

The paper presents FastKmeans++, an algorithm that improves initialization time for k-means++ from O(n d k^2) to O(n d + n k^2), enhancing efficiency for large-scale data.

Findings

01

FastKmeans++ runs in O(n d + n k^2) time

02

Achieves nearly optimal running time for k-means++ initialization

03

Significantly faster than previous algorithms for large datasets

Abstract

$k$ -means++ is an important algorithm for choosing initial cluster centers for the $k$ -means clustering algorithm. In this work, we present a new algorithm that can solve the $k$ -means++ problem with nearly optimal running time. Given $n$ data points in $R^{d}$ , the current state-of-the-art algorithm runs in $O (k)$ iterations, and each iteration takes $O (n d k)$ time. The overall running time is thus $O (n d k^{2})$ . We propose a new algorithm \textsc{FastKmeans++} that only takes in $O (n d + n k^{2})$ time, in total.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Facility Location and Emergency Management · Data Management and Algorithms

Methodsk-Means Clustering