Adapting $k$-means algorithms for outliers
Christoph Grunau, V\'aclav Rozho\v{n}

TL;DR
This paper extends classical $k$-means algorithms to effectively handle outliers, providing stronger theoretical guarantees and improved outlier output control in both sequential and distributed settings.
Contribution
It introduces new adaptations of $k$-means algorithms for outliers with stronger approximation guarantees and precise outlier output control, improving upon previous methods.
Findings
Algorithms output $(1+ ext{epsilon})z$ outliers
Achieve $O(1/ ext{epsilon})$-approximation to the objective
Matching lower bound of $ ext{Omega}(nk^2/z)$ in the oracle model
Abstract
This paper shows how to adapt several simple and classical sampling-based algorithms for the -means problem to the setting with outliers. Recently, Bhaskara et al. (NeurIPS 2019) showed how to adapt the classical -means++ algorithm to the setting with outliers. However, their algorithm needs to output outliers, where is the number of true outliers, to match the -approximation guarantee of -means++. In this paper, we build on their ideas and show how to adapt several sequential and distributed -means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output outliers while achieving an -approximation to the objective function. In the sequential world, we achieve this by adapting a recent algorithm of Lattanzi and Sohler (ICML 2019). In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Anomaly Detection Techniques and Applications · Sparse and Compressive Sensing Techniques
