TL;DR
Grinch is a scalable, non-greedy hierarchical clustering algorithm that efficiently constructs accurate cluster trees with complex structures, outperforming existing methods in speed and accuracy on benchmark datasets.
Contribution
We introduce Grinch, a novel algorithm for large-scale hierarchical clustering with theoretical guarantees and improved efficiency over traditional methods.
Findings
Grinch is more accurate than other scalable clustering methods.
It is orders of magnitude faster than hierarchical agglomerative clustering.
The algorithm guarantees inclusion of the ground-truth clustering under certain conditions.
Abstract
We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notion of separability for clustering with linkage functions: we prove that when the model is consistent with a ground-truth clustering, Grinch is guaranteed to produce a cluster tree containing the ground-truth, independent of data arrival order. Our empirical results on benchmark and author coreference datasets (with standard and learned linkage functions) show that Grinch is more accurate than other scalable methods, and orders of magnitude faster than hierarchical agglomerative clustering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
