Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth
Laxman Dhulipala, David Eisenstat, Jakub {\L}\k{a}cki, Vahab, Mirronki, Jessica Shi

TL;DR
This paper introduces ParHAC, a parallel hierarchical agglomerative clustering algorithm with sublinear depth that scales efficiently to large datasets, providing near-optimal clustering quality and significant speedups over traditional methods.
Contribution
The paper presents the first efficient parallel HAC algorithm with poly-logarithmic depth and provides a $(1+ ext{epsilon})$-approximation for average-linkage clustering on large graphs.
Findings
Achieves a 50.1x speedup over the best sequential baseline.
Can cluster a 124-billion-edge graph in just over three hours.
Maintains clustering quality comparable to exact HAC.
Abstract
Obtaining scalable algorithms for hierarchical agglomerative clustering (HAC) is of significant interest due to the massive size of real-world datasets. At the same time, efficiently parallelizing HAC is difficult due to the seemingly sequential nature of the algorithm. In this paper, we address this issue and present ParHAC, the first efficient parallel HAC algorithm with sublinear depth for the widely-used average-linkage function. In particular, we provide a -approximation algorithm for this problem on edge graphs using work and poly-logarithmic depth. Moreover, we show that obtaining similar bounds for exact average-linkage HAC is not possible under standard complexity-theoretic assumptions. We complement our theoretical results with a comprehensive study of the ParHAC algorithm in terms of its scalability, performance, and quality, and compare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Municipal Solid Waste Management · Complex Network Analysis Techniques
