Randomized Local Fast Rerouting for Datacenter Networks with Almost Optimal Congestion
Gregor Bankhamer, Robert Els\"asser, Stefan Schmid

TL;DR
This paper introduces a randomized local rerouting algorithm for Clos datacenter networks that achieves near-optimal congestion under multiple link failures, ensuring high availability and resilience.
Contribution
It proposes a novel decentralized rerouting method for Clos networks that guarantees low congestion with limited local failure information.
Findings
Achieves asymptotically minimal congestion under bounded failures.
Provides theoretical bounds and guarantees for rerouting performance.
Validates effectiveness through rigorous analysis and proofs.
Abstract
To ensure high availability, datacenter networks must rely on local fast rerouting mechanisms that allow routers to quickly react to link failures, in a fully decentralized manner. However, configuring these mechanisms to provide a high resilience against multiple failures while avoiding congestion along failover routes is algorithmically challenging, as the rerouting rules can only depend on local failure information and must be defined ahead of time. This paper presents a randomized local fast rerouting algorithm for Clos networks, the predominant datacenter topologies. Given a graph describing a Clos topology, our algorithm defines local routing rules for each node , which only depend on the packet's destination and are conditioned on the incident link failures. We prove that as long as number of failures at each node does not exceed a certain bound, our algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
