A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids
N. T. Karonis, B. de Supinski, I. Foster, W. Gropp, E. Lusk

TL;DR
This paper introduces a multilevel topology-aware approach for optimizing collective communication operations in heterogeneous computational grids, significantly improving performance over traditional and two-layer methods.
Contribution
It presents a novel multilevel topology-aware tree construction strategy for MPI collectives, leveraging detailed network hierarchy information during runtime.
Findings
Multilevel topology-aware trees outperform default and two-layer methods.
Significant communication cost reductions in heterogeneous network environments.
Automatic construction of optimized trees during execution.
Abstract
The efficient implementation of collective communiction operations has received much attention. Initial efforts produced "optimal" trees based on network communication models that assumed equal point-to-point latencies between any two processes. This assumption is violated in most practical settings, however, particularly in heterogeneous systems such as clusters of SMPs and wide-area "computational Grids," with the result that collective operations perform suboptimally. In response, more recent work has focused on creating topology-aware trees for collective operations that minimize communication across slower channels (e.g., a wide-area network). While these efforts have significant communication benefits, they all limit their view of the network to only two layers. We present a strategy based upon a multilayer view of the network. By creating multilevel topology-aware trees we take…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems
