$k$-ported vs. $k$-lane Broadcast, Scatter, and Alltoall Algorithms
Jesper Larsson Tr\"aff

TL;DR
This paper compares algorithms for collective communication operations in a $k$-lane message-passing model, analyzing their performance on a small cluster, and explores how to adapt $k$-ported algorithms to this setting.
Contribution
It introduces and evaluates $k$-lane algorithms for broadcast, scatter, and alltoall operations, adapting from $k$-ported models to modern clustered systems.
Findings
Preliminary experimental results on a 36x32 cluster.
Comparison of non-optimal $k$-lane algorithms.
Insights into adapting $k$-ported algorithms for $k$-lane systems.
Abstract
In -ported message-passing systems, a processor can simultaneously receive different messages from other processors, and send different messages to other processors that may or may not be different from the processors from which messages are received. Modern clustered systems may not have such capabilities. Instead, compute nodes consisting of processors can simultaneously send and receive messages from other nodes, by letting processors on the nodes concurrently send and receive at most one message. We pose the question of how to design good algorithms for this -lane model, possibly by adapting algorithms devised for the traditional -ported model. We discuss and compare a number of (non-optimal) -lane algorithms for the broadcast, scatter and alltoall collective operations (as found in, e.g., MPI), and experimentally evaluate these on a small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Distributed systems and fault tolerance · Cloud Computing and Resource Management
