Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model
Tianyi Wang, Huawei Fan, Yuanchao Shu, Peng Cheng, Cong Wang

TL;DR
This paper shifts focus from algorithmic to system-level latency attacks on LLM serving frameworks, introducing a new Fill and Squeeze attack strategy that exploits system behaviors to cause significant slowdowns.
Contribution
The paper presents a novel system-level attack method against LLM serving systems, demonstrating its effectiveness over existing algorithmic latency attacks.
Findings
Achieves 20-280x slowdown in Time to First Token
Attacks are effective with lower cost and in black-box settings
System optimizations like batching mitigate traditional latency attacks
Abstract
Large Language Models face an emerging and critical threat known as latency attacks. Because LLM inference is inherently expensive, even modest slowdowns can translate into substantial operating costs and severe availability risks. Recently, a growing body of research has focused on algorithmic complexity attacks by crafting inputs to trigger worst-case output lengths. However, we report a counter-intuitive finding that these algorithmic latency attacks are largely ineffective against modern LLM serving systems. We reveal that system-level optimization such as continuous batching provides a logical isolation to mitigate contagious latency impact on co-located users. To this end, in this paper, we shift the focus from the algorithm to the system layer, and introduce a new Fill and Squeeze attack strategy targeting the state transition of the scheduler. "Fill" first exhausts the global KV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Software System Performance and Reliability
