Maximum Likelihood Estimation for Single Linkage Hierarchical Clustering
Dekang Zhu, Dan P. Guralnik, Xuezhi Wang, Xiang Li, Bill Moran

TL;DR
This paper introduces a statistical model for estimating dendrogram structures in single linkage hierarchical clustering, accounting for measurement noise, and demonstrates its effectiveness through simulations.
Contribution
It presents the first maximum likelihood estimator for dendrogram structure in SLHC, improving robustness to noise compared to traditional methods.
Findings
The MLE outperforms standard SLHC in noisy conditions.
The method effectively estimates dendrogram structure from corrupted data.
Simulations show promising results for small datasets.
Abstract
We derive a statistical model for estimation of a dendrogram from single linkage hierarchical clustering (SLHC) that takes account of uncertainty through noise or corruption in the measurements of separation of data. Our focus is on just the estimation of the hierarchy of partitions afforded by the dendrogram, rather than the heights in the latter. The concept of estimating this "dendrogram structure'' is introduced, and an approximate maximum likelihood estimator (MLE) for the dendrogram structure is described. These ideas are illustrated by a simple Monte Carlo simulation that, at least for small data sets, suggests the method outperforms SLHC in the presence of noise.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Complex Network Analysis Techniques
