Improving Injection-Throttling Mechanisms for Congestion Control for Data-center and Supercomputer Interconnects
Cristina Olmedilla, Jesus Escudero-Sahuquillo, Pedro J. Garcia, Francisco J. Quiles, Jose Duato

TL;DR
This paper enhances congestion control mechanisms in high-speed data-center and supercomputer networks by refining the DCQCN protocol to improve detection accuracy, signaling efficiency, and throttling precision, thereby reducing overhead and unnecessary flow restrictions.
Contribution
The paper introduces a refined DCQCN-based congestion control mechanism with improved detection, signaling, and throttling, addressing limitations of existing methods in modern high-performance networks.
Findings
Reduced control traffic overhead
More accurate congestion detection
Avoided unnecessary flow throttling
Abstract
Over the past decade, Supercomputers and Data centers have evolved dramatically to cope with the increasing performance requirements of applications and services, such as scientific computing, generative AI, social networks or cloud services. This evolution have led these systems to incorporate high-speed networks using faster links, end nodes using multiple and dedicated accelerators, or a advancements in memory technologies to bridge the memory bottleneck. The interconnection network is a key element in these systems and it must be thoroughly designed so it is not the bottleneck of the entire system, bearing in mind the countless communication operations that generate current applications and services. Congestion is serious threat that spoils the interconnection network performance, and its effects are even more dramatic when looking at the traffic dynamics and bottlenecks generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques
