Statistical analysis of a hierarchical clustering algorithm with outliers
Nicolas Klutchnikoff, Audrey Poterie (UBS), Laurent Rouviere

TL;DR
This paper introduces a new hierarchical clustering algorithm designed to effectively identify clusters despite the presence of outliers, backed by theoretical guarantees and empirical validation.
Contribution
A novel hierarchical clustering method with proven robustness to outliers, including theoretical performance bounds and empirical comparisons.
Findings
Algorithm recovers true clusters with high probability
Establishes consistency and convergence rates
Outperforms classical methods in simulations
Abstract
It is well known that the classical single linkage algorithm usually fails to identify clusters in the presence of outliers. In this paper, we propose a new version of this algorithm, and we study its mathematical performances. In particular, we establish an oracle type inequality which ensures that our procedure allows to recover the clusters with large probability under minimal assumptions on the distribution of the outliers. We deduce from this inequality the consistency and some rates of convergence of our algorithm for various situations. Performances of our approach is also assessed through simulation studies and a comparison with classical clustering algorithms on simulated data is also presented.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Advanced Clustering Algorithms Research · Bayesian Methods and Mixture Models
