GraphCC: A Practical Graph Learning-based Approach to Congestion Control in Datacenters
Guillermo Bern\'ardez, Jos\'e Su\'arez-Varela, Xiang Shi, Shihan Xiao,, Xiangle Cheng, Pere Barlet-Ros, Albert Cabellos-Aparicio

TL;DR
GraphCC is a machine learning framework that dynamically optimizes congestion control in data center networks using multi-agent reinforcement learning and graph neural networks, adapting to changing network conditions for improved performance.
Contribution
It introduces a distributed, ML-based approach combining MARL and GNNs for in-network ECN parameter tuning, enhancing adaptability and performance over existing methods.
Findings
Outperforms state-of-the-art MARL-based ECN tuning solutions in diverse scenarios.
Achieves up to 20% reduction in Flow Completion Time.
Reduces buffer occupancy by 38-85.7%.
Abstract
Congestion Control (CC) plays a fundamental role in optimizing traffic in Data Center Networks (DCN). Currently, DCNs mainly implement two main CC protocols: DCTCP and DCQCN. Both protocols -- and their main variants -- are based on Explicit Congestion Notification (ECN), where intermediate switches mark packets when they detect congestion. The ECN configuration is thus a crucial aspect on the performance of CC protocols. Nowadays, network experts set static ECN parameters carefully selected to optimize the average network performance. However, today's high-speed DCNs experience quick and abrupt changes that severely change the network state (e.g., dynamic traffic workloads, incast events, failures). This leads to under-utilization and sub-optimal performance. This paper presents GraphCC, a novel Machine Learning-based framework for in-network CC optimization. Our distributed solution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Cloud Computing and Resource Management · Advanced Memory and Neural Computing
