Near-optimal Coresets for Robust Clustering
Lingxiao Huang, Shaofeng H.-C. Jiang, Jianing Lou, Xuan Wu

TL;DR
This paper introduces near-optimal epsilon-coresets for robust clustering in high-dimensional spaces, significantly reducing dataset size while maintaining accuracy, and enabling faster algorithms for outlier-resistant clustering tasks.
Contribution
It presents the first epsilon-coreset of size nearly linear in m and polynomial in k and epsilon, improving previous exponential bounds and adapting recent frameworks to outlier settings.
Findings
Coresets of size O(m + poly(k, epsilon^{-1})) constructed in near-linear time
Superior size-accuracy tradeoff compared to uniform and sensitivity sampling
Enables significant speedups in robust clustering algorithms
Abstract
We consider robust clustering problems in , specifically -clustering problems (e.g., -Median and -Means with outliers, where the cost for a given center set aggregates the distances from to all but the furthest data points, instead of all points as in classical clustering. We focus on the -coreset for robust clustering, a small proxy of the dataset that preserves the clustering cost within -relative error for all center sets. Our main result is an -coreset of size that can be constructed in near-linear time. This significantly improves previous results, which either suffers an exponential dependence on [Feldman and Schulman, SODA'12], or has a weaker bi-criteria guarantee [Huang et al., FOCS'18]. Furthermore, we show this dependence in is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFacility Location and Emergency Management · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
