Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs
Benjamin Fuhrer, Yuval Shpigelman, Chen Tessler, Shie Mannor, Gal, Chechik, Eitan Zahavi, Gal Dalal

TL;DR
This paper presents a lightweight reinforcement learning-based congestion control method deployed on NVIDIA NICs, significantly improving real-time decision-making and outperforming traditional algorithms across multiple network benchmarks.
Contribution
It introduces a neural network distillation into decision trees for fast inference, enabling RL-based congestion control to run on network hardware in real-time.
Findings
X500 reduction in inference time
Outperforms traditional CC algorithms on all benchmarks
Balances bandwidth, latency, and packet drops effectively
Abstract
As communication protocols evolve, datacenter network utilization increases. As a result, congestion is more frequent, causing higher latency and packet loss. Combined with the increasing complexity of workloads, manual design of congestion control (CC) algorithms becomes extremely difficult. This calls for the development of AI approaches to replace the human effort. Unfortunately, it is currently not possible to deploy AI models on network devices due to their limited computational capabilities. Here, we offer a solution to this problem by building a computationally-light solution based on a recent reinforcement learning CC algorithm [arXiv:2207.02295]. We reduce the inference time of RL-CC by x500 by distilling its complex neural network into decision trees. This transformation enables real-time inference within the -sec decision-time requirement, with a negligible effect on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Software-Defined Networks and 5G
