Fully Scalable MPC Algorithms for Clustering in High Dimension
Artur Czumaj, Guichen Gao, Shaofeng H.-C. Jiang, Robert, Krauthgamer, Pavel Vesel\'y

TL;DR
This paper introduces fully scalable parallel algorithms for high-dimensional clustering in the MPC model, achieving constant-round approximations for facility location, k-Median, and k-Means with minimal local memory.
Contribution
It presents the first fully scalable MPC algorithms with constant rounds for high-dimensional clustering, including a novel geometric aggregation primitive based on consistent hashing.
Findings
First fully scalable MPC algorithm for $O(1)$-approximate facility location.
Constant-round MPC algorithms for $O(1)$-approximate k-Median and k-Means.
Development of a new MPC primitive for geometric neighborhood statistics.
Abstract
We design new parallel algorithms for clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine may be for arbitrarily small fixed . Importantly, the local memory may be substantially smaller than the number of clusters , yet all our algorithms are fast, i.e., run in rounds. We first devise a fast MPC algorithm for -approximation of uniform facility location. This is the first fully-scalable MPC algorithm that achieves -approximation for any clustering problem in general geometric setting; previous algorithms only provide -approximation or apply to restricted inputs, like low dimension or small number of clusters ; e.g. [Bhaskara and Wijewardena, ICML'18; Cohen-Addad et al.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
