PrefixWall: Mitigating Prefix Caching Side Channels in Shared LLM Systems
Panagiotis Georgios Pennas, Konstantinos Papaioannou, Marco Guarnieri, Thaleia Dimitra Doudali

TL;DR
PrefixWall is a system designed to prevent timing side channels in shared LLM inference by monitoring and selectively isolating cache reuse, maintaining high performance and efficiency.
Contribution
It introduces a novel approach to secure multi-tenant LLM systems against cache side channels without sacrificing inference speed or efficiency.
Findings
PrefixWall achieves up to 70% higher cache reuse.
It reduces inference latency by 30% compared to existing defenses.
The system effectively detects and isolates suspicious cache sharing.
Abstract
Large Language Models (LLMs) rely on optimizations like Automatic Prefix Caching (APC) to accelerate inference. APC works by reusing previously computed states for the beginning part of a request (prefix), when another request starts with the same text. While APC improves throughput, it introduces timing side channels: cache hits are faster than misses, creating observable latency differences. In multi-tenant systems, attackers can exploit these differences to infer sensitive information, e.g., by incrementally reconstructing another user's request by observing hit/miss patterns. Current defenses take a sledgehammer approach: they disable APC and cache sharing, isolating users, and sacrificing efficiency for regular users. This paper presents PrefixWall, a system that secures multi-tenant LLM serving systems against APC side channels without sacrificing performance and efficiency.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Caching and Content Delivery
