Hierarchical Clustering better than Average-Linkage
Moses Charikar, Vaggos Chatziafratis, Rad Niazadeh

TL;DR
This paper demonstrates that average-linkage hierarchical clustering has significant worst-case limitations for certain objectives and introduces new semidefinite programming algorithms that outperform it.
Contribution
The paper provides tight worst-case bounds for average-linkage and proposes new SDP-based algorithms with better approximation guarantees.
Findings
Average-linkage cannot surpass 1/3 and 2/3 approximation ratios for specific HC objectives.
Counterexamples show the limitations of average-linkage in worst-case scenarios.
New SDP algorithms achieve better approximation ratios than average-linkage.
Abstract
Hierarchical Clustering (HC) is a widely studied problem in exploratory data analysis, usually tackled by simple agglomerative procedures like average-linkage, single-linkage or complete-linkage. In this paper we focus on two objectives, introduced recently to give insight into the performance of average-linkage clustering: a similarity based HC objective proposed by [Moseley and Wang, 2017] and a dissimilarity based HC objective proposed by [Cohen-Addad et al., 2018]. In both cases, we present tight counterexamples showing that average-linkage cannot obtain better than 1/3 and 2/3 approximations respectively (in the worst-case), settling an open question raised in [Moseley and Wang, 2017]. This matches the approximation ratio of a random solution, raising a natural question: can we beat average-linkage for these objectives? We answer this in the affirmative, giving two new algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Complex Network Analysis Techniques
