TL;DR
This paper introduces a variable-group algorithm for agglomerative hierarchical clustering that addresses non-uniqueness issues caused by tied distances, using multidendrograms for better representation.
Contribution
It proposes a novel variable-group clustering algorithm and a multidendrogram representation to handle tie situations without arbitrary decisions.
Findings
Algorithm effectively resolves non-uniqueness in clustering.
Multidendrograms provide a comprehensive visualization of tie-based clustering.
Generalized Lance and Williams' formula enables recursive implementation.
Abstract
In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance and Williams' formula which enables the implementation of the algorithm in a recursive way.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
