Characterizing the Performance of Node-Aware Strategies for Irregular Point-to-Point Communication on Heterogeneous Architectures
Shelby Lockhart, Amanda Bienz, William D. Gropp, Luke N. Olson

TL;DR
This paper analyzes the performance of node-aware communication strategies for irregular point-to-point MPI communication on heterogeneous supercomputers, highlighting the benefits of staging data through host processes and utilizing all CPU cores.
Contribution
It introduces performance models for irregular communication on heterogeneous architectures, demonstrating the effectiveness of node-aware strategies and staging through host processes.
Findings
Node-aware communication with all CPU cores is most efficient for high node counts.
Staging data through host processes improves communication performance.
Models accurately predict communication behavior in distributed sparse matrix-vector operations.
Abstract
Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI on heterogeneous compute environments through performance modeling, demonstrating the limitations of standard communication strategies for both device-aware and staging-through-host communication techniques. Presented models suggest staging communicated data through host processes then using node-aware communication strategies for high inter-node message counts. Notably, the models also predict that node-aware communication utilizing all available CPU cores to communicate inter-node data leads…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
