Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model

Tianyi Wang; Huawei Fan; Yuanchao Shu; Peng Cheng; Cong Wang

arXiv:2602.07878·cs.CR·February 10, 2026

Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model

Tianyi Wang, Huawei Fan, Yuanchao Shu, Peng Cheng, Cong Wang

PDF

Open Access

TL;DR

This paper shifts focus from algorithmic to system-level latency attacks on LLM serving frameworks, introducing a new Fill and Squeeze attack strategy that exploits system behaviors to cause significant slowdowns.

Contribution

The paper presents a novel system-level attack method against LLM serving systems, demonstrating its effectiveness over existing algorithmic latency attacks.

Findings

01

Achieves 20-280x slowdown in Time to First Token

02

Attacks are effective with lower cost and in black-box settings

03

System optimizations like batching mitigate traditional latency attacks

Abstract

Large Language Models face an emerging and critical threat known as latency attacks. Because LLM inference is inherently expensive, even modest slowdowns can translate into substantial operating costs and severe availability risks. Recently, a growing body of research has focused on algorithmic complexity attacks by crafting inputs to trigger worst-case output lengths. However, we report a counter-intuitive finding that these algorithmic latency attacks are largely ineffective against modern LLM serving systems. We reveal that system-level optimization such as continuous batching provides a logical isolation to mitigate contagious latency impact on co-located users. To this end, in this paper, we shift the focus from the algorithm to the system layer, and introduce a new Fill and Squeeze attack strategy targeting the state transition of the scheduler. "Fill" first exhausts the global KV…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Software System Performance and Reliability