Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol
Weikuan Yu, Darius Buntinas, Rich L. Graham, and Dhabaleswar K. Panda

TL;DR
This paper presents a new NIC-based collective protocol that significantly improves barrier operation latency over high-performance interconnects like Quadrics and Myrinet, demonstrating scalability and efficiency for large clusters.
Contribution
The paper introduces a novel NIC-based collective protocol that simplifies communication processing, enabling scalable and efficient barrier operations on high-performance interconnects.
Findings
Barrier latency of 5.60μs over 8 nodes on Quadrics
Barrier latency of 14.20μs over 8 nodes on Myrinet
Estimated 22.13μs latency for 1024-node cluster on Quadrics
Abstract
Modern interconnects often have programmable processors in the network interface that can be utilized to offload communication processing from host CPU. In this paper, we explore different schemes to support collective operations at the network interface and propose a new collective protocol. With barrier as an initial case study, we have demontrated that much of the communication processing can be greatly simplified with this collective protocol. Accordingly, %with our proposed collective processing scheme, we have designed and implemented efficient and scalable NIC-based barrier operations over two high performance interconnects, Quadrics and Myrinet. Our evaluation shows that, over a Quadrics cluster of 8 nodes with ELan3 Network, the NIC-based barrier operation achieves a barrier latency of only 5.60s. This result is a 2.48 factor of improvement over the Elanlib tree-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Parallel Computing and Optimization Techniques · Distributed systems and fault tolerance
