A Nested Partitioning Scheme for Parallel Heterogeneous Clusters
Jesse Kelly, Omar Ghattas, Hari Sundar

TL;DR
This paper introduces a nested partitioning scheme for heterogeneous clusters that improves load balancing and resource utilization by asymmetrically dividing work between CPUs and accelerators, enhancing parallel efficiency.
Contribution
The paper proposes a novel nested partitioning approach that enables effective work-parallelism on heterogeneous clusters, addressing load balancing and communication challenges.
Findings
Achieves high efficiency in heterogeneous cluster computations.
Balances CPU and accelerator utilization effectively.
Demonstrates improvements on wave propagation simulations.
Abstract
Modern supercomputers are increasingly requiring the presence of accelerators and co-processors. However, it has not been easy to achieve good performance on such heterogeneous clusters. The key challenge has been to ensure good load balance and that neither the CPU nor the accelerator is left idle. Traditional approaches have offloaded entire computations to the accelerator, resulting in an idle CPU, or have opted for task-level parallelism requiring large data transfers between the CPU and the accelerator. True work-parallelism has been hard as the Accelerators cannot directly communicate with other CPUs (besides the host) and Accelerators. In this work, we present a new nested partition scheme to overcome this problem. By partitioning the work assignment on a given node asymmetrically into boundary and interior work, and assigning the interior to the accelerator, we are able to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical methods in engineering · Electromagnetic Scattering and Analysis · Composite Material Mechanics
