Approximate Hierarchical Clustering via Sparsest Cut and Spreading Metrics
Moses Charikar, Vaggos Chatziafratis

TL;DR
This paper analyzes approximation algorithms for hierarchical clustering based on sparsest cut and spreading metrics, providing new bounds, hardness results, and relaxations with potential for improved solutions.
Contribution
It proves that sparsest cut heuristics achieve an O(alpha_n) approximation, introduces a spreading metric SDP relaxation with an O(sqrt{log n}) gap, and establishes hardness results under the SSE hypothesis.
Findings
Sparsest cut heuristic achieves an O(alpha_n) approximation for hierarchical clustering.
Spreading metric SDP relaxation has an integrality gap of at most O(sqrt{log n}).
Hierarchical clustering is hard to approximate within any constant factor under SSE hypothesis.
Abstract
Dasgupta recently introduced a cost function for the hierarchical clustering of a set of points given pairwise similarities between them. He showed that this function is NP-hard to optimize, but a top-down recursive partitioning heuristic based on an alpha_n-approximation algorithm for uniform sparsest cut gives an approximation of O(alpha_n log n) (the current best algorithm has alpha_n=O(sqrt{log n})). We show that the aforementioned sparsest cut heuristic in fact obtains an O(alpha_n)-approximation for hierarchical clustering. The algorithm also applies to a generalized cost function studied by Dasgupta. Moreover, we obtain a strong inapproximability result, showing that the hierarchical clustering objective is hard to approximate to within any constant factor assuming the Small-Set Expansion (SSE) Hypothesis. Finally, we discuss approximation algorithms based on convex relaxations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFacility Location and Emergency Management · Multi-Criteria Decision Making · Sparse and Compressive Sensing Techniques
