Hierarchical Linkage Clustering Beyond Binary Trees and Ultrametrics
Maximilien Dreveton, Matthias Grossglauser, Daichi Kuroda, Patrick Thiran

TL;DR
This paper introduces a new framework for hierarchical clustering that overcomes traditional limitations by defining valid hierarchies, allowing non-binary structures, and providing an algorithm to recover the most informative hierarchy consistent with data similarities.
Contribution
It formalizes the concept of valid hierarchies, proves the existence of a finest such hierarchy, and presents a pruning algorithm applicable to classical linkage methods.
Findings
Classical linkage methods like single, complete, and average satisfy the conditions for exact hierarchy recovery.
The proposed algorithm can recover the finest valid hierarchy from data.
Ward's linkage does not satisfy the conditions and may not recover the optimal hierarchy.
Abstract
Hierarchical clustering seeks to uncover nested structures in data by constructing a tree of clusters, where deeper levels reveal finer-grained relationships. Traditional methods, including linkage approaches, face three major limitations: (i) they always return a hierarchy, even if none exists, (ii) they are restricted to binary trees, even if the true hierarchy is non-binary, and (iii) they are highly sensitive to the choice of linkage function. In this paper, we address these issues by introducing the notion of a valid hierarchy and defining a partial order over the set of valid hierarchies. We prove the existence of a finest valid hierarchy, that is, the hierarchy that encodes the maximum information consistent with the similarity structure of the data set. In particular, the finest valid hierarchy is not constrained to binary structures and, when no hierarchical relationships…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Clustering Algorithms Research · Advanced Graph Neural Networks
