Mini-Batch Kernel $k$-means

Ben Jourdan; Gregory Schwartzman

arXiv:2410.05902·cs.LG·October 10, 2024

Mini-Batch Kernel $k$-means

Ben Jourdan, Gregory Schwartzman

PDF

Open Access

TL;DR

This paper introduces the first mini-batch kernel $k$-means algorithm that significantly accelerates clustering on large datasets with minimal quality loss, supported by theoretical guarantees and extensive experiments.

Contribution

It presents a novel mini-batch kernel $k$-means algorithm with proven convergence and approximation guarantees, enabling scalable kernel clustering.

Findings

01

Achieves 10-100x speedup over full kernel $k$-means.

02

Maintains clustering quality with minimal loss.

03

Provides theoretical analysis with convergence and approximation bounds.

Abstract

We present the first mini-batch kernel $k$ -means algorithm, offering an order of magnitude improvement in running time compared to the full batch algorithm. A single iteration of our algorithm takes $O (k b^{2})$ time, significantly faster than the $O (n^{2})$ time required by the full batch kernel $k$ -means, where $n$ is the dataset size and $b$ is the batch size. Extensive experiments demonstrate that our algorithm consistently achieves a 10-100x speedup with minimal loss in quality, addressing the slow runtime that has limited kernel $k$ -means adoption in practice. We further complement these results with a theoretical analysis under an early stopping condition, proving that with a batch size of $Ω (max {γ^{4}, γ^{2}} \cdot ϵ^{- 2})$ , the algorithm terminates in $O (γ^{2} / ϵ)$ iterations with high probability, where $γ$ bounds the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition

MethodsEarly Stopping