Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing
Yunze Zhao, Yibo Zhao, Yuchen Zhang, Zaoxing Liu, Michelle L. Mazurek

TL;DR
This paper introduces GRIEF, a greybox fuzzer that detects security and reliability vulnerabilities in LLM inference systems caused by shared-state concurrency and caching, revealing issues missed by standard tests.
Contribution
The paper presents GRIEF, a novel fuzzing approach targeting LLM serving systems' concurrency and shared-state behaviors, uncovering 15 previously unknown vulnerabilities including CVEs.
Findings
GRIEF discovered 15 vulnerabilities in LLM serving systems.
10 vulnerabilities were confirmed by engine developers, including 2 CVEs.
Shared-state behaviors can cause silent contamination, performance issues, and crashes.
Abstract
LLM inference and serving systems have become security-critical infrastructure; however, many of their most concerning failures arise from the serving layer rather than from model behavior alone. Modern inference engines combine KV cache, batching, prefix sharing, speculative decoding, adapters, and multi-tenant scheduling, creating shared-state behavior that only emerges under realistic concurrent workloads and is missed by standard model, safety, and API tests. We present GRIEF, a greybox fuzzer for LLM inference engines that treats timed multi-request traces as first-class inputs, uses lightweight oracles to detect crashes, hangs, performance pathologies, and silent output corruption, and applies controlled replay with log-probability checks to confirm reproducible serving-layer failures. Across early campaigns on vLLM and SGLang, GRIEF discovers 15 vulnerabilities, 10 confirmed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
