Improved Smoothed Analysis of the k-Means Method

Bodo Manthey; Heiko R\"oglin

arXiv:0809.1715·cs.DS·September 11, 2008·1 cites

Improved Smoothed Analysis of the k-Means Method

Bodo Manthey, Heiko R\"oglin

PDF

Open Access

TL;DR

This paper improves the theoretical understanding of the smoothed running-time of the k-means clustering algorithm, providing tighter bounds that better match its practical efficiency, especially in low-dimensional and one-dimensional cases.

Contribution

The authors present two new upper bounds on the expected smoothed running-time of k-means, refining previous results and showing polynomial bounds under certain conditions.

Findings

01

Expected running-time bounded by polynomial in n^{√k} and σ^{-1}

02

Expected running-time bounded by k^{kd}·poly(n, σ^{-1})

03

k-means runs in smoothed polynomial time for one-dimensional data

Abstract

The k-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii (FOCS 2006) aimed at closing this gap, and they proved a bound of $\poly (n^{k}, σ^{- 1})$ on the smoothed running-time of the k-means method, where n is the number of data points and $σ$ is the standard deviation of the Gaussian perturbation. This bound, though better than the worst-case bound, is still much larger than the running-time observed in practice. We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in $n^{k}$ and $σ^{- 1}$ . Second, we prove an upper bound of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Neural Networks and Applications