Improving dynamic congestion isolation in data-center networks
Alberto Merino, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Quiles

TL;DR
This paper introduces ICI, a new congestion control mechanism that combines CI and DCQCN to reduce false congestion signals, improve latency, and enhance network efficiency in data-center networks.
Contribution
The paper proposes ICI, a novel congestion isolation method that effectively coordinates CI and DCQCN, reducing false positives and improving responsiveness in data-center networks.
Findings
ICI reduces BECN signals by up to 32x.
ICI improves tail latency by up to 31%.
ICI maintains high throughput and scalability.
Abstract
The rise of distributed AI and large-scale applications has impacted the communication operations of data-center and Supercomputer interconnection networks, leading to dramatic incast or in-network congestion scenarios and challenging existing congestion control mechanisms, such as injection throttling (e.g., DCQCN) or congestion isolation (CI). While DCQCN provides a scalable traffic rate adjustment for congesting flows at end nodes (which is slow) and CI effectively isolates these flows in special network resources (which requires extra logic in the switches), their combined use, although it diminishes their particular drawbacks, leads to false congestion scenarios identification and signaling, excessive throttling, and inefficient network resource utilization. In this paper, we propose a new CI mechanism, called Improved Congestion Isolation (ICI), which efficiently combines CI and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Cloud Computing and Resource Management · Network Traffic and Congestion Control
