A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network
Nils Blach, Maciej Besta, Daniele De Sensi, Jens Domke, Hussein, Harake, Shigang Li, Patrick Iff, Marek Konieczny, Kartik Lakhotia, Ales, Kubicek, Marcel Ferrari, Fabrizio Petrini, Torsten Hoefler

TL;DR
This paper presents the design, deployment, and evaluation of a real-world Slim Fly network, demonstrating its advantages over traditional topologies in performance, cost, and scalability for large-scale server clusters.
Contribution
First real-world implementation and deployment of a Slim Fly network, including novel routing architecture and practical deployment techniques for low-diameter topologies.
Findings
SF outperforms Fat Tree in scalability and cost
SF achieves strong performance on neural, graph, and linear algebra workloads
Deployment techniques simplify cabling and validation
Abstract
Novel low-diameter network topologies such as Slim Fly (SF) offer significant cost and power advantages over the established Fat Tree, Clos, or Dragonfly. To spearhead the adoption of low-diameter networks, we design, implement, deploy, and evaluate the first real-world SF installation. We focus on deployment, management, and operational aspects of our test cluster with 200 servers and carefully analyze performance. We demonstrate techniques for simple cabling and cabling validation as well as a novel high-performance routing architecture for InfiniBand-based low-diameter topologies. Our real-world benchmarks show SF's strong performance for many modern workloads such as deep neural network training, graph analytics, or linear algebra kernels. SF outperforms non-blocking Fat Trees in scalability while offering comparable or better performance and lower cost for large network sizes. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Software-Defined Networks and 5G · Advanced Memory and Neural Computing
