Work-stealing for mixed-mode parallelism by deterministic team-building
Martin Wimmer, Jesper Larsson Tr\"aff

TL;DR
This paper extends classical work-stealing to efficiently handle data parallel tasks requiring multiple threads, introducing deterministic team-building, and demonstrates significant performance improvements in parallel Quicksort on multi-core systems.
Contribution
It introduces a generalized work-stealing algorithm with deterministic team-building for mixed-mode parallelism, along with a prototype implementation and improved parallel Quicksort performance.
Findings
Achieved speed-up of 8.7 on 32 cores for large sorting tasks.
Generalized work-stealing effectively manages data parallel tasks.
Outperforms Cilk++ in parallel Quicksort benchmarks.
Abstract
We show how to extend classical work-stealing to deal also with data parallel tasks that can require any number of threads r >= 1 for their execution. We explain in detail the so introduced idea of work-stealing with deterministic team-building which in a natural way generalizes classical work-stealing. A prototype C++ implementation of the generalized work-stealing algorithm has been given and is briefly described. Building on this, a serious, well-known contender for a best parallel Quicksort algorithm has been implemented, which naturally relies on both task and data parallelism. For instance, sorting 2^27-1 randomly generated integers we could improve the speed-up from 5.1 to 8.7 on a 32-core Intel Nehalem EX system, being consistently better than the tuned, task-parallel Cilk++ system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
