Scaling Distributed All-Pairs Algorithms: Manage Computation and Limit Data Replication with Quorums
Cory J. Kleinheksel, Arun K. Somani

TL;DR
This paper introduces cyclic quorum sets for efficient distributed all-pairs computations, reducing data replication and memory usage while improving scalability and speed on real datasets.
Contribution
Proposes and proves the effectiveness of cyclic quorum sets for managing all-pairs computations with smaller quorums, enhancing scalability and efficiency.
Findings
Quorum sizes are O(N/√P), up to 50% smaller than previous methods.
Achieved 7x speedup on 8 nodes with reduced memory per process.
Demonstrated scalability on real datasets.
Abstract
In this paper we propose and prove that cyclic quorum sets can efficiently manage all-pairs computations and data replication. The quorums are O(N/sqrt(P)) in size, up to 50% smaller than the dual N/sqrt(P) array implementations, and significantly smaller than solutions requiring all data. Implementation evaluation demonstrated scalability on real datasets with a 7x speed up on 8 nodes with 1/3rd the memory usage per process. The all-pairs problem requires all data elements to be paired with all other data elements. These all-pair problems occur in many science fields, which has led to their continued interest. Additionally, as datasets grow in size, new methods like these that can reduce memory footprints and distribute work equally across compute nodes will be demanded.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Distributed systems and fault tolerance
