From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering
Ines Chami, Albert Gu, Vaggos Chatziafratis, Christopher R\'e

TL;DR
This paper introduces HypHC, a novel continuous relaxation approach for hierarchical clustering that uses hyperbolic embeddings to optimize tree structures with provable guarantees, outperforming traditional heuristics.
Contribution
HypHC provides the first continuous relaxation of Dasgupta's hierarchical clustering problem with theoretical approximation guarantees and demonstrates superior empirical performance.
Findings
HypHC achieves a (1 + epsilon)-approximation for optimal trees.
Gradient-based optimization with HypHC outperforms classical heuristics.
HypHC is flexible for downstream tasks like classification.
Abstract
Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees. The key idea of our method, HypHC, is showing a direct correspondence from discrete trees to continuous representations (via the hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm that maps leaf embeddings to a dendrogram), allowing us to search the space of discrete binary trees with continuous optimization. Building on analogies between trees and hyperbolic space, we derive a continuous analogue for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Bayesian Methods and Mixture Models
