Learning-Augmented Hierarchical Clustering
Vladimir Braverman, Jon C. Ergun, Chen Wang, Samson Zhou

TL;DR
This paper introduces algorithms for hierarchical clustering that leverage auxiliary oracle information to achieve near-optimal solutions, overcoming traditional hardness barriers in clustering objectives.
Contribution
It presents novel polynomial and near-linear time algorithms using a splitting oracle to improve hierarchical clustering approximation ratios.
Findings
Polynomial-time algorithm with O(1)-approximation for Dasgupta's objective.
Near-linear time algorithm with (1-o(1))-approximation for Moseley-Wang's objective.
Under the Small Set Expansion Hypothesis, no polynomial algorithm can achieve certain approximation guarantees.
Abstract
Hierarchical clustering (HC) is an important data analysis technique in which the goal is to recursively partition a dataset into a tree-like structure while grouping together similar data points at each level of granularity. Unfortunately, for many of the proposed HC objectives, there exist strong barriers to approximation algorithms with the hardness of approximation. Thus, we consider the problem of hierarchical clustering given auxiliary information from natural oracles. Specifically, we focus on a *splitting oracle* which, when provided with a triplet of vertices , answers (possibly erroneously) the pairs of vertices whose lowest common ancestor includes all three vertices in an optimal tree, i.e., identifying which vertex ``splits away'' from the others. Using such an oracle, we obtain the following results: - A polynomial-time algorithm that outputs a hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Stochastic Gradient Optimization Techniques · Face and Expression Recognition
MethodsFocus · Sparse Evolutionary Training
