A Faster $k$-means++ Algorithm
Jiehao Liang, Somdeb Sarkhel, Zhao Song, Chenbo Yin, Junze Yin,, Danyang Zhuo

TL;DR
This paper introduces FastKmeans++, a new algorithm that significantly reduces the running time for initializing k-means clustering, making it nearly optimal for large datasets.
Contribution
The paper presents FastKmeans++, an algorithm that improves initialization time for k-means++ from O(n d k^2) to O(n d + n k^2), enhancing efficiency for large-scale data.
Findings
FastKmeans++ runs in O(n d + n k^2) time
Achieves nearly optimal running time for k-means++ initialization
Significantly faster than previous algorithms for large datasets
Abstract
-means++ is an important algorithm for choosing initial cluster centers for the -means clustering algorithm. In this work, we present a new algorithm that can solve the -means++ problem with nearly optimal running time. Given data points in , the current state-of-the-art algorithm runs in iterations, and each iteration takes time. The overall running time is thus . We propose a new algorithm \textsc{FastKmeans++} that only takes in time, in total.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Facility Location and Emergency Management · Data Management and Algorithms
Methodsk-Means Clustering
