DCAFE: Dynamic load-balanced loop Chunking & Aggressive Finish Elimination for Recursive Task Parallel Programs
Suyash Gupta, Rahul Shrivastava, V. Krishna Nandivada

TL;DR
This paper introduces DCAFE, a set of optimizations for recursive task parallel programs that significantly reduce overheads and improve performance and energy efficiency across different hardware platforms.
Contribution
The paper presents two novel optimizations, Aggressive Finish-Elimination and Dynamic Load-Balanced loop Chunking, integrated into the X10 compiler to enhance recursive task parallel program execution.
Findings
Achieved up to 5.75x speedup on Intel hardware
Reduced energy consumption by 71.2% on Intel system
Improved performance across different hardware architectures
Abstract
In this paper, we present two symbiotic optimizations to optimize recursive task parallel (RTP) programs by reducing the task creation and termination overheads. Our first optimization Aggressive Finish-Elimination (AFE) helps reduce the redundant join operations to a large extent. The second optimization Dynamic Load-Balanced loop Chunking (DLBC) extends the prior work on loop chunking to decide on the number of parallel tasks based on the number of available worker threads, at runtime. Further, we discuss the impact of exceptions on our optimizations and extend them to handle RTP programs that may throw exceptions. We implemented DCAFE (= DLBC+AFE) in the X10v2.3 compiler and tested it over a set of benchmark kernels on two different hardwares (a 16-core Intel system and a 64-core AMD system). With respect to the base X10 compiler extended with loop-chunking of Nandivada et al…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
