Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under $\ell_p$-Distances
Grigory Yaroslavtsev, Adithya Vadapalli

TL;DR
This paper introduces efficient massively parallel algorithms for single-linkage clustering under various $ ext{L}_p$ distances, providing both approximation algorithms and hardness results, with practical implementation demonstrating significant speedups.
Contribution
It presents the first $O( ext{log} n)$ round MPC algorithms for single-linkage clustering with approximation guarantees and establishes hardness results for fewer rounds.
Findings
Algorithms run in $O( ext{log} n)$ rounds with $(1+ ext{epsilon})$-approximation
Exact algorithm for Hamming distance
Experimental speedups in Apache Spark implementation
Abstract
We present massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of input -dimensional vectors under Hamming, and distances. All our algorithms run in rounds of MPC for any fixed and achieve -approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor inapproximability results for -round algorithms under standard MPC hardness assumptions (for sufficiently large dimension depending on the distance used). Efficiency of implementation of our algorithms in Apache Spark is demonstrated through experiments on a variety of datasets exhibiting speedups of several orders of magnitude.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Data Mining Algorithms and Applications
