On the Sparsifiability of Correlation Clustering: Approximation Guarantees under Edge Sampling
Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

TL;DR
This paper investigates how much edge information is necessary for correlation clustering to retain approximation guarantees, revealing a structural dichotomy based on pseudometric properties and establishing bounds on sparsification and robustness.
Contribution
It introduces a sparsification-approximation framework for correlation clustering, providing optimal coreset sizes, active triangle inequalities bounds, and a robust approximation algorithm under edge sampling.
Findings
Optimal size $ ilde{O}(n/ ext{epsilon}^2)$ coresets for clustering disagreement
At most $inom{n}{2}$ triangle inequalities active at LP vertices
Robust $rac{10}{3}$-approximation with $ ilde{ heta}(n^{3/2})$ edges observed
Abstract
Correlation Clustering (CC) is a fundamental unsupervised learning primitive whose strongest LP-based approximation guarantees require triangle inequality constraints and are prohibitive at scale. We initiate the study of \emph{sparsification--approximation trade-offs} for CC, asking how much edge information is needed to retain LP-based guarantees. We establish a structural dichotomy between pseudometric and general weighted instances. On the positive side, we prove that the VC dimension of the clustering disagreement class is exactly , yielding additive -coresets of optimal size ; that at most triangle inequalities are active at any LP vertex, enabling an exact cutting-plane solver; and that a sparsified variant of LP-PIVOT, which imputes missing LP marginals via triangle inequalities, achieves a robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Face and Expression Recognition · Machine Learning and Algorithms
