CAFT: Congestion-Aware Fault-Tolerant Load Balancing for Three-Tier Clos Data Centers
Sultan Alanazi, Bechir Hamdaoui

TL;DR
CAFT is a congestion-aware, fault-tolerant load balancing protocol for 3-tier data centers that improves throughput and reduces latency by using real-time congestion data for path selection.
Contribution
It introduces a distributed protocol that collects real-time congestion info and uses dual candidate paths for robust, efficient load balancing in asymmetric data center networks.
Findings
CAFT outperforms Expeditus in mean flow completion time.
CAFT achieves higher network throughput.
CAFT maintains robustness in asymmetric topologies.
Abstract
Production data centers operate under various workload sizes ranging from latency-sensitive mice flows to long-lived elephant flows. However, the predominant load balancing scheme in data center networks, equal-cost multi-path (ECMP), is agnostic to path conditions and performs poorly in asymmetric topologies, resulting in low throughput and high latencies. In this paper, we propose CAFT, a distributed congestion-aware fault-tolerant load balancing protocol for 3-tier data center networks. It first collects, in real time, the complete congestion information of two subsets from the set of all possible paths between any two hosts. Then, the best path congestion information from each subset is carried across the switches, during the Transport Control Protocol (TCP) connection process, to make path selection decision. Having two candidate paths improve the robustness of CAFT to asymmetries…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Cloud Computing and Resource Management · Interconnection Networks and Systems
