# Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems   Using an MPI+MPI Approach

**Authors:** Ahmed Eleliemy, Florina M. Ciorba

arXiv: 1903.09510 · 2019-03-25

## TL;DR

This paper presents a novel hierarchical dynamic loop self-scheduling method using an MPI+MPI approach on distributed-memory systems, demonstrating performance benefits over traditional MPI+OpenMP implementations.

## Contribution

It introduces a new MPI+MPI based hierarchical DLS technique, simplifying implementation and improving performance in distributed-memory parallel applications.

## Key findings

- Performance advantages over MPI+OpenMP approach
- Effective load balancing in irregular loops
- Successful implementation of four DLS techniques

## Abstract

Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.09510/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1903.09510/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/1903.09510/full.md

---
Source: https://tomesphere.com/paper/1903.09510