Clustering under Perturbation Resilience
Maria Florina Balcan, Yingyu Liang

TL;DR
This paper explores clustering algorithms under perturbation resilience, providing optimal solutions for certain objectives and developing new linkage criteria, with implications for real-world data where distances are heuristic.
Contribution
It introduces algorithms for perturbation-resilient clustering, including optimal solutions for center-based objectives and bounds for k-median under relaxed assumptions.
Findings
Optimal clustering for perturbation factor (1 + √2)
First bounds for k-median with small fraction change
Sublinear-time algorithms using random sampling
Abstract
Motivated by the fact that distances between data points in many real-world clustering instances are often based on heuristic measures, Bilu and Linial~\cite{BL} proposed analyzing objective based clustering problems under the assumption that the optimum clustering to the objective is preserved under small multiplicative perturbations to distances between points. The hope is that by exploiting the structure in such instances, one can overcome worst case hardness results. In this paper, we provide several results within this framework. For center-based objectives, we present an algorithm that can optimally cluster instances resilient to perturbations of factor , solving an open problem of Awasthi et al.~\cite{ABS10}. For -median, a center-based objective of special interest, we additionally give algorithms for a more relaxed assumption in which we allow the optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Clustering Under Perturbation Resilience· youtube
Taxonomy
TopicsData Management and Algorithms · Privacy-Preserving Technologies in Data · Facility Location and Emergency Management
