TL;DR
This paper introduces NUMA-WS, a NUMA-aware task parallel platform based on the work-first principle, which reduces work inflation and maintains efficiency and scalability on modern NUMA architectures.
Contribution
It presents a novel NUMA-aware scheduling platform that mitigates work inflation while preserving the theoretical guarantees of work stealing.
Findings
NUMA-WS reduces work inflation in task parallel programs.
The platform maintains work efficiency and scalability.
Empirical results show improved performance on NUMA architectures.
Abstract
Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher compared to the time required to do the same amount of work on one core, due to effects experienced only during parallel executions such as additional cache misses, remote memory accesses, and memory bandwidth issues. It's possible to mitigate work inflation by co-locating the computation with the data, but this is nontrivial to do with task parallel programs. First, by design, the scheduling for task parallel programs is automated, giving the user little control over where the computation is performed. Second, the platforms tend to employ work stealing, which provides strong theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
