Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm
Marek Gagolewski, Maciej Bartoszuk, Anna Cena

TL;DR
Genie is a fast, outlier-resistant hierarchical clustering algorithm that uses an economic inequity measure to improve clustering quality while maintaining speed and flexibility across various data types.
Contribution
It introduces a novel linkage criterion based on economic inequity measures, enhancing clustering robustness and efficiency over classical methods.
Findings
Outperforms Ward and average linkage in clustering quality.
Retains the speed of single linkage with added robustness.
Easily parallelizable and applicable to diverse data types.
Abstract
The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms, and therefore usually does not reflect the true underlying data structure -- unless the clusters are well-separated. To overcome its limitations, we propose a new hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini- or Bonferroni-index) of the cluster sizes does not drastically increase above a given threshold. The presented benchmarks indicate a high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
