A Distributed Chunk Calculation Approach for Self-scheduling of Parallel Applications on Distributed-memory Systems
Ahmed Eleliemy, Florina M. Ciorba

TL;DR
This paper introduces a distributed chunk calculation approach (DCA) for dynamic loop self-scheduling in distributed-memory systems, improving load balancing and performance especially under system slowdown conditions.
Contribution
It proposes a novel distributed chunk calculation approach (DCA) that supports various DLS techniques, outperforming traditional centralized methods in load balancing.
Findings
DCA-based DLS techniques outperform CCA-based ones in load balancing.
DCA improves performance under extreme system slowdown scenarios.
Twelve DLS techniques were implemented and evaluated.
Abstract
Loop scheduling techniques aim to achieve load-balanced executions of scientific applications. Dynamic loop self-scheduling (DLS) libraries for distributed-memory systems are typically MPI-based and employ a centralized chunk calculation approach (CCA) to assign variably-sized chunks of loop iterations. We present a distributed chunk calculation approach (DCA) that supports various types of DLS techniques. Using both CCA and DCA, twelve DLS techniques are implemented and evaluated in different CPU slowdown scenarios. The results show that the DLS techniques implemented using DCA outperform their corresponding ones implemented with CCA, especially in extreme system slowdown scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
