DCFIT: Initial Trigger-Based PFC Deadlock Detection in the Data Plane
Xinyu Crystal Wu, T.S. Eugene Ng

TL;DR
This paper introduces DCFIT, a data plane mechanism for quick deadlock detection and prevention in lossless data center networks, enhancing network reliability without relying solely on avoidance strategies.
Contribution
The paper presents DCFIT, a novel trigger-based deadlock detection method that works across various topologies and routing protocols directly in the data plane.
Findings
Detects deadlocks rapidly with minimal overhead
Prevents recurrence of the same deadlocks effectively
Works across arbitrary network topologies
Abstract
Recent data center applications rely on lossless networks to achieve high network performance. Lossless networks, however, can suffer from in-network deadlocks induced by hop-by-hop flow control protocols like PFC. Once deadlocks occur, large parts of the network could be blocked. Existing solutions mainly center on a deadlock avoidance strategy; unfortunately, they are not foolproof. Thus, deadlock detection is a necessary last resort. In this paper, we propose DCFIT, a new mechanism performed entirely in the data plane to detect and solve deadlocks for arbitrary network topologies and routing protocols. Unique to DCFIT is the use of deadlock initial triggers, which contribute to efficient deadlock detection and deadlock recurrence prevention. Preliminary results indicate that DCFIT can detect deadlocks quickly with minimal overhead and mitigate the recurrence of the same deadlocks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Software-Defined Networks and 5G · Cloud Computing and Resource Management
