Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees
Daniele De Sensi, Edgar Costa Molero, Salvatore Di Girolamo, Laurent, Vanbever, Torsten Hoefler

TL;DR
Canary introduces a congestion-aware in-network allreduce algorithm that dynamically balances load to improve performance, using a P4 prototype and simulations to demonstrate up to 40% gains over existing methods.
Contribution
It is the first to develop a congestion-aware in-network allreduce algorithm that dynamically balances load without predefined switch port knowledge.
Findings
Performance improvements up to 40% over state-of-the-art.
Effective load balancing reduces congestion during allreduce operations.
Prototype implementation on Tofino switches validates practical feasibility.
Abstract
The allreduce operation is an essential building block for many distributed applications, ranging from the training of deep learning models to scientific computing. In an allreduce operation, data from multiple hosts is aggregated together and then broadcasted to each host participating in the operation. Allreduce performance can be improved by a factor of two by aggregating the data directly in the network. Switches aggregate data coming from multiple ports before forwarding the partially aggregated result to the next hop. In all existing solutions, each switch needs to know the ports from which it will receive the data to aggregate. However, this forces packets to traverse a predefined set of switches, making these solutions prone to congestion. For this reason, we design Canary, the first congestion-aware in-network allreduce algorithm. Canary uses load balancing algorithms to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
