Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms
Stijn Heldens, Pieter Hijma, Ben van Werkhoven, Jason Maassen, Henri, Bal, Rob van Nieuwpoort

TL;DR
This paper introduces Rocket, a scalable and efficient framework for all-pairs computations on heterogeneous platforms, leveraging hierarchical caching, dynamic load balancing, and asynchronous processing to achieve high performance and scalability.
Contribution
Rocket's novel approach combines hierarchical caching, divide-and-conquer, and dynamic work-stealing to improve data reuse and scalability in all-pairs computations across diverse hardware.
Findings
Achieves excellent efficiency and scalability up to 96 GPUs.
Obtains super-linear speedups due to distributed cache.
Demonstrates effectiveness on real-world applications in various domains.
Abstract
All-pairs compute problems apply a user-defined function to each combination of two items of a given data set. Although these problems present an abundance of parallelism, data reuse must be exploited to achieve good performance. Several researchers considered this problem, either resorting to partial replication with static work distribution or dynamic scheduling with full replication. In contrast, we present a solution that relies on hierarchical multi-level software-based caches to maximize data reuse at each level in the distributed memory hierarchy, combined with a divide-and-conquer approach to exploit data locality, hierarchical work-stealing to dynamically balance the workload, and asynchronous processing to maximize resource utilization. We evaluate our solution using three real-world applications (from digital forensics, localization microscopy, and bioinformatics) on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
