On the Efficiency of K-Means Clustering: Evaluation, Optimization, and Algorithm Selection
Sheng Wang, Yuan Sun, Zhifeng Bao

TL;DR
This paper evaluates and optimizes methods for accelerating k-means clustering, introduces a unified framework for performance analysis, and explores automatic method selection via machine learning.
Contribution
It develops UniK, a unified evaluation framework, and proposes an optimized hybrid algorithm with automatic method selection for improved k-means efficiency.
Findings
UniK enables detailed performance analysis of acceleration methods.
The hybrid algorithm outperforms individual methods in pruning efficiency.
Machine learning can effectively select the best acceleration method for a given task.
Abstract
This paper presents a thorough evaluation of the existing methods that accelerate Lloyd's algorithm for fast k-means clustering. To do so, we analyze the pruning mechanisms of existing methods, and summarize their common pipeline into a unified evaluation framework UniK. UniK embraces a class of well-known methods and enables a fine-grained performance breakdown. Within UniK, we thoroughly evaluate the pros and cons of existing methods using multiple performance metrics on a number of datasets. Furthermore, we derive an optimized algorithm over UniK, which effectively hybridizes multiple existing methods for more aggressive pruning. To take this further, we investigate whether the most efficient method for a given clustering task can be automatically selected by machine learning, to benefit practitioners and researchers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Data Stream Mining Techniques
MethodsPruning
