Leveraging InfiniBand Controller to Configure Deadlock-Free Routing Engines for Dragonflies
German Maglione-Mathey, Jesus Escudero-Sahuquillo, Pedro Javier, Garcia, Francisco J. Quiles, Eitan Zahavi

TL;DR
This paper presents a method to integrate deadlock-free routing algorithms into InfiniBand controllers for Dragonfly networks, validated through experiments on real and simulated clusters.
Contribution
It introduces a straightforward approach to implement a deadlock-free routing engine in OpenSM for Dragonfly topologies, filling a gap in existing InfiniBand support.
Findings
The new routing engine is validated on real hardware and simulation.
Performance comparisons show advantages over existing routing engines.
The method enables deadlock-free routing in InfiniBand Dragonfly networks.
Abstract
The Dragonfly topology is currently one of the most popular network topologies in high-performance parallel systems. The interconnection networks of many of these systems are built from components based on the InfiniBand specification. However, due to some constraints in this specification, the available versions of the InfiniBand network controller (OpenSM) do not include routing engines based on some popular deadlock-free routing algorithms proposed theoretically for Dragonflies, such as the one proposed by Kim and Dally based on Virtual-Channel shifting. In this paper we propose a straightforward method to integrate this routing algorithm in OpenSM as a routing engine, explaining in detail the configuration required to support it. We also provide experiment results, obtained both from a real InfiniBand-based cluster and from simulation, to validate the new routing engine and to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
