Adapting $k$-means algorithms for outliers

Christoph Grunau; V\'aclav Rozho\v{n}

arXiv:2007.01118·cs.DS·September 26, 2022

Adapting $k$-means algorithms for outliers

Christoph Grunau, V\'aclav Rozho\v{n}

PDF

Open Access

TL;DR

This paper extends classical $k$-means algorithms to effectively handle outliers, providing stronger theoretical guarantees and improved outlier output control in both sequential and distributed settings.

Contribution

It introduces new adaptations of $k$-means algorithms for outliers with stronger approximation guarantees and precise outlier output control, improving upon previous methods.

Findings

01

Algorithms output $(1+ ext{epsilon})z$ outliers

02

Achieve $O(1/ ext{epsilon})$-approximation to the objective

03

Matching lower bound of $ ext{Omega}(nk^2/z)$ in the oracle model

Abstract

This paper shows how to adapt several simple and classical sampling-based algorithms for the $k$ -means problem to the setting with outliers. Recently, Bhaskara et al. (NeurIPS 2019) showed how to adapt the classical $k$ -means++ algorithm to the setting with outliers. However, their algorithm needs to output $O (lo g (k) \cdot z)$ outliers, where $z$ is the number of true outliers, to match the $O (lo g k)$ -approximation guarantee of $k$ -means++. In this paper, we build on their ideas and show how to adapt several sequential and distributed $k$ -means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output $(1 + ε) z$ outliers while achieving an $O (1/ ε)$ -approximation to the objective function. In the sequential world, we achieve this by adapting a recent algorithm of Lattanzi and Sohler (ICML 2019). In the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Anomaly Detection Techniques and Applications · Sparse and Compressive Sensing Techniques