Spritz: Path-Aware Load Balancing in Low-Diameter Networks
Tommaso Bonato, Ales Kubicek, Abdul Kabbani, Ahmad Ghalayini, Maciej Besta, Torsten Hoefler

TL;DR
Spritz is a sender-based load balancing framework for low-diameter networks like Dragonfly and Slim Fly, improving flow completion times and failover robustness without extra hardware by utilizing standard Ethernet features.
Contribution
It introduces Spritz, a novel sender-based load balancing approach that leverages topology-aware routing and feedback mechanisms for low-diameter datacenter networks.
Findings
Spritz outperforms ECMP, UGAL-L, and prior approaches by up to 1.8x in flow completion time.
Achieves up to 25.4x performance improvement under link failures.
Works effectively on Dragonfly and Slim Fly topologies with over 1000 endpoints.
Abstract
Low-diameter topologies such as Dragonfly and Slim Fly are increasingly adopted in HPC and datacenter networks, yet existing load balancing techniques either rely on proprietary in-network mechanisms or fail to utilize the full path diversity of these topologies. We introduce Spritz, a flexible sender-based load balancing framework that shifts adaptive topology-aware routing to the endpoints using only standard Ethernet features. We propose two algorithms, Spritz-Scout and Spritz-Spray that, respectively, explore and adaptively cache efficient paths using ECN, packet trimming, and timeout feedback. Through simulation on Dragonfly and Slim Fly topologies with over 1000 endpoints, Spritz outperforms ECMP, UGAL-L, and prior sender-based approaches by up to 1.8x in flow completion time under AI training and datacenter workloads, while offering robust failover with performance improvements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Cloud Computing and Resource Management · Advanced Optical Network Technologies
