Whose Narrative is it Anyway? A KV Cache Manipulation Attack
Mukkesh Ganesh, Kaushik Iyer, Arun Baalaaji Sankar Ananthan

TL;DR
This paper presents 'History Swapping,' a novel attack on LLMs' KV cache that can manipulate generated topics without changing the input prompt, revealing security vulnerabilities in model state management.
Contribution
Introduces a new block-level attack method on KV caches that can steer LLM outputs, highlighting security risks and insights into model internal representations.
Findings
Full-layer cache overwrites can hijack conversation topics
High-level structural plans are encoded early in generation
Local discourse structure is maintained by final layers
Abstract
The Key Value(KV) cache is an important component for efficient inference in autoregressive Large Language Models (LLMs), but its role as a representation of the model's internal state makes it a potential target for integrity attacks. This paper introduces "History Swapping," a novel block-level attack that manipulates the KV cache to steer model generation without altering the user-facing prompt. The attack involves overwriting a contiguous segment of the active generation's cache with a precomputed cache from a different topic. We empirically evaluate this method across 324 configurations on the Qwen 3 family of models, analyzing the impact of timing, magnitude, and layer depth of the cache overwrite. Our findings reveal that only full-layer overwrites can successfully hijack the conversation's topic, leading to three distinct behaviors: immediate and persistent topic shift, partial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
