Efficient Centroid-Linkage Clustering
MohammadHossein Bateni, Laxman Dhulipala, Willem Fletcher, Kishen N, Gowda, D Ellis Hershkowitz, Rajesh Jayaram, Jakub {\L}\k{a}cki

TL;DR
This paper introduces an efficient approximation algorithm for Centroid-Linkage Hierarchical Clustering that significantly speeds up computation while maintaining clustering quality, supported by empirical evaluation.
Contribution
It presents a novel $c$-approximate clustering algorithm with a new dynamic data structure for nearest neighbor search, improving speed over existing methods.
Findings
Achieves up to 36x speedup over exact baseline
Maintains clustering quality comparable to state-of-the-art methods
Provides empirical evidence of efficiency and accuracy
Abstract
We give an efficient algorithm for Centroid-Linkage Hierarchical Agglomerative Clustering (HAC), which computes a -approximate clustering in roughly time. We obtain our result by combining a new Centroid-Linkage HAC algorithm with a novel fully dynamic data structure for nearest neighbor search which works under adaptive updates. We also evaluate our algorithm empirically. By leveraging a state-of-the-art nearest-neighbor search library, we obtain a fast and accurate Centroid-Linkage HAC algorithm. Compared to an existing state-of-the-art exact baseline, our implementation maintains the clustering quality while delivering up to a speedup due to performing fewer distance comparisons.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBioinformatics and Genomic Networks
