Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects
Maya Taylor, Kavitha Chandrasekar, Laxmikant V. Kale

TL;DR
This paper introduces a communication-aware diffusion load balancing method tailored for communication-intensive parallel applications with persistent object interactions, aiming to reduce communication overhead and improve load distribution.
Contribution
It proposes a novel diffusion-based load balancing strategy that leverages communication graphs and offers an algorithmic variant for unknown communication patterns.
Findings
Reduces communication overhead in load balancing.
Effective load distribution demonstrated on Particle-in-Cell benchmark.
Outperforms related strategies in simulations and real-world tests.
Abstract
Parallel applications with irregular and time-varying workloads often suffer from load imbalance. Dynamic load balancing techniques address this challenge by redistributing work during execution. We present a new type of distributed diffusion-based load balancing targeted at communication-intensive applications with persistently communicating objects. Leveraging the application's communication graph, our strategy reduces across-node communication while simultaneously distributing load effectively. We also propose an algorithmic variant for cases where the communication patterns are not readily available. We explore optimizations to our algorithm, and comparisons with other related load balancing strategies in simulation and on a Particle-in-Cell benchmark on up to 8 nodes of Perlmutter at NERSC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques
