Fast $k$-means Seeding Under The Manifold Hypothesis

Poojan Shah; Shashwat Agrawal; Ragesh Jaiswal

arXiv:2602.01104·cs.DS·February 3, 2026

Fast $k$-means Seeding Under The Manifold Hypothesis

Poojan Shah, Shashwat Agrawal, Ragesh Jaiswal

PDF

Open Access

TL;DR

This paper introduces a new seeding method for $k$-means clustering, leveraging the manifold hypothesis to achieve faster runtimes with predictable approximation guarantees, validated through extensive empirical testing.

Contribution

It proposes $ ext{Qkmeans}$, a novel seeding algorithm that exploits geometric properties of data on low-dimensional manifolds for improved efficiency and theoretical guarantees.

Findings

01

$ ext{Qkmeans}$ achieves $O( ho^{-2} ext{log} k)$ approximation.

02

The algorithm runs in $O(nD) + ilde{O}( ext{epsilon}^{1+ ho} ho^{-1}k^{1+ ext{gamma}})$ time.

03

Empirical results validate theoretical predictions across various domains.

Abstract

We study beyond worst case analysis for the $k$ -means problem where the goal is to model typical instances of $k$ -means arising in practice. Existing theoretical approaches provide guarantees under certain assumptions on the optimal solutions to $k$ -means, making them difficult to validate in practice. We propose the manifold hypothesis, where data obtained in ambient dimension $D$ concentrates around a low dimensional manifold of intrinsic dimension $d$ , as a reasonable assumption to model real world clustering instances. We identify key geometric properties of datasets which have theoretically predictable scaling laws depending on the quantization exponent $ε = 2/ d$ using techniques from optimum quantization theory. We show how to exploit these regularities to design a fast seeding method called $Qkmeans$ which provides $O (ρ^{- 2} lo g k)$ approximate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Facility Location and Emergency Management · Stochastic Gradient Optimization Techniques