Distributed Work Stealing in a Task-Based Dataflow Runtime

Joseph John; Josh Milthorpe; Peter Strazdins

arXiv:2211.00838·cs.DC·November 11, 2022·1 cites

Distributed Work Stealing in a Task-Based Dataflow Runtime

Joseph John, Josh Milthorpe, Peter Strazdins

PDF

Open Access

TL;DR

This paper introduces distributed work stealing strategies for task-based dataflow runtimes, demonstrating significant performance improvements in sparse matrix factorization tasks.

Contribution

It extends the PaR-SEC runtime with novel distributed work stealing policies that consider future tasks and expected wait times.

Findings

01

Achieved up to 35% speedup in sparse Cholesky factorization

02

Effective load balancing with distributed work stealing policies

03

Demonstrated advantages over static work division

Abstract

The task-based dataflow programming model has emerged as an alternative to the process-centric programming model for extreme-scale applications. However, load balancing is still a challenge in task-based dataflow runtimes. In this paper, we present extensions to the PaR-SEC runtime to demonstrate that distributed work stealing is an effective load-balancing method for task-based dataflow runtimes. In contrast to shared-memory work stealing, we find that each process should consider future tasks and the expected waiting time for execution when determining whether to steal. We demonstrate the effectiveness of the proposed work-stealing policies for a sparse Cholesky factorization, which shows a speedup of up to 35% compared to a static division of work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems