New bounds on the cohesion of complete-link and other linkage methods for agglomeration clustering
Sanjoy Dasgupta, Eduardo Laber

TL;DR
This paper improves theoretical bounds on the maximum diameter of clusters produced by complete-link hierarchical clustering, highlighting its advantages over single-link in producing compact clusters, and extends bounds to other linkage methods.
Contribution
It provides new bounds on clustering diameter for complete-link and other linkage methods, clarifying their comparative effectiveness for compact clustering.
Findings
New bounds separate complete-link from single-link in approximation quality.
Complete-link is more suitable for compact clusters than single-link.
Bounds also apply to average-link and similar linkage methods.
Abstract
Linkage methods are among the most popular algorithms for hierarchical clustering. Despite their relevance the current knowledge regarding the quality of the clustering produced by these methods is limited. Here, we improve the currently available bounds on the maximum diameter of the clustering obtained by complete-link for metric spaces. One of our new bounds, in contrast to the existing ones, allows us to separate complete-link from single-link in terms of approximation for the diameter, which corroborates the common perception that the former is more suitable than the latter when the goal is producing compact clusters. We also show that our techniques can be employed to derive upper bounds on the cohesion of a class of linkage methods that includes the quite popular average-link.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFacility Location and Emergency Management
