Hierarchical Clustering via Spreading Metrics
Aurko Roy, Sebastian Pokutta

TL;DR
This paper introduces an improved $O( ext{log } n)$-approximation algorithm for hierarchical clustering based on spreading metrics, with theoretical guarantees and practical benefits over existing methods.
Contribution
It provides the first $O( ext{log } n)$-approximation algorithm for hierarchical clustering using spreading metrics, with a new ILP formulation and sphere growing technique.
Findings
The algorithm achieves better approximation ratios than previous methods.
Hierarchies produced often have improved flat cluster projections.
The approach extends to a generalized cost function with similar guarantees.
Abstract
We study the cost function for hierarchical clusterings introduced by [arXiv:1510.05043] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [arXiv:1510.05043] that a top-down algorithm returns a hierarchical clustering of cost at most times the cost of the optimal hierarchical clustering, where is the approximation ratio of the Sparsest Cut subroutine used. Thus using the best known approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top down algorithm returns a hierarchical clustering of cost at most times the cost of the optimal solution. We improve this by giving an -approximation algorithm for this problem. Our main technical ingredients are a combinatorial characterization of ultrametrics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Data Management and Algorithms · Computational Geometry and Mesh Generation
