Noisy k-means++ Revisited

Christoph Grunau; Ahmet Alper \"Oz\"udo\u{g}ru; V\'aclav Rozho\v{n}

arXiv:2307.13685·cs.DS·July 26, 2023·1 cites

Noisy k-means++ Revisited

Christoph Grunau, Ahmet Alper \"Oz\"udo\u{g}ru, V\'aclav Rozho\v{n}

PDF

Open Access

TL;DR

This paper proves that the $k$-means++ algorithm maintains an $O( ext{log }k)$ approximation ratio even when small adversarial noise is introduced in the sampling process, closing a gap in previous research.

Contribution

It establishes a tight $O( ext{log }k)$ approximation guarantee for noisy $k$-means++ algorithms, improving upon the previous weaker bounds.

Findings

01

The $k$-means++ algorithm retains its $O( ext{log }k)$ approximation under noisy conditions.

02

Previous weaker bounds of $O( ext{log}^2 k)$ are improved to tight bounds.

03

The analysis confirms robustness of $k$-means++ against small adversarial perturbations.

Abstract

The $k$ -means++ algorithm by Arthur and Vassilvitskii [SODA 2007] is a classical and time-tested algorithm for the $k$ -means problem. While being very practical, the algorithm also has good theoretical guarantees: its solution is $O (lo g k)$ -approximate, in expectation. In a recent work, Bhattacharya, Eube, Roglin, and Schmidt [ESA 2020] considered the following question: does the algorithm retain its guarantees if we allow for a slight adversarial noise in the sampling probability distributions used by the algorithm? This is motivated e.g. by the fact that computations with real numbers in $k$ -means++ implementations are inexact. Surprisingly, the analysis under this scenario gets substantially more difficult and the authors were able to prove only a weaker approximation guarantee of $O (lo g^{2} k)$ . In this paper, we close the gap by providing a tight, $O (lo g k)$ -approximate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Cryptography and Data Security · Privacy-Preserving Technologies in Data