On the Efficiency of K-Means Clustering: Evaluation, Optimization, and   Algorithm Selection

Sheng Wang; Yuan Sun; Zhifeng Bao

arXiv:2010.06654·cs.DB·October 28, 2020·1 cites

On the Efficiency of K-Means Clustering: Evaluation, Optimization, and Algorithm Selection

Sheng Wang, Yuan Sun, Zhifeng Bao

PDF

Open Access

TL;DR

This paper evaluates and optimizes methods for accelerating k-means clustering, introduces a unified framework for performance analysis, and explores automatic method selection via machine learning.

Contribution

It develops UniK, a unified evaluation framework, and proposes an optimized hybrid algorithm with automatic method selection for improved k-means efficiency.

Findings

01

UniK enables detailed performance analysis of acceleration methods.

02

The hybrid algorithm outperforms individual methods in pruning efficiency.

03

Machine learning can effectively select the best acceleration method for a given task.

Abstract

This paper presents a thorough evaluation of the existing methods that accelerate Lloyd's algorithm for fast k-means clustering. To do so, we analyze the pruning mechanisms of existing methods, and summarize their common pipeline into a unified evaluation framework UniK. UniK embraces a class of well-known methods and enables a fine-grained performance breakdown. Within UniK, we thoroughly evaluate the pros and cons of existing methods using multiple performance metrics on a number of datasets. Furthermore, we derive an optimized algorithm over UniK, which effectively hybridizes multiple existing methods for more aggressive pruning. To take this further, we investigate whether the most efficient method for a given clustering task can be automatically selected by machine learning, to benefit practitioners and researchers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Data Stream Mining Techniques

MethodsPruning