Distributed Work Stealing in a Task-Based Dataflow Runtime
Joseph John, Josh Milthorpe, Peter Strazdins

TL;DR
This paper introduces distributed work stealing strategies for task-based dataflow runtimes, demonstrating significant performance improvements in sparse matrix factorization tasks.
Contribution
It extends the PaR-SEC runtime with novel distributed work stealing policies that consider future tasks and expected wait times.
Findings
Achieved up to 35% speedup in sparse Cholesky factorization
Effective load balancing with distributed work stealing policies
Demonstrated advantages over static work division
Abstract
The task-based dataflow programming model has emerged as an alternative to the process-centric programming model for extreme-scale applications. However, load balancing is still a challenge in task-based dataflow runtimes. In this paper, we present extensions to the PaR-SEC runtime to demonstrate that distributed work stealing is an effective load-balancing method for task-based dataflow runtimes. In contrast to shared-memory work stealing, we find that each process should consider future tasks and the expected waiting time for execution when determining whether to steal. We demonstrate the effectiveness of the proposed work-stealing policies for a sparse Cholesky factorization, which shows a speedup of up to 35% compared to a static division of work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems
