RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse
Mingrui Liu, Sixiao Zhang, Cheng Long, Kwok-Yan Lam

TL;DR
RedVisor is a novel defense framework for Large Language Models that combines explainability and prevention to detect and reject prompt injection attacks efficiently without degrading model utility.
Contribution
RedVisor introduces a reasoning-aware, fine-grained detection and prevention approach with a lightweight adapter and KV Cache Reuse, improving detection accuracy and throughput.
Findings
Outperforms state-of-the-art defenses in detection accuracy.
Achieves high throughput with negligible utility loss.
Successfully integrated into vLLM serving engine.
Abstract
Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. Current defenses typically face a critical trade-off: prevention-based fine-tuning often degrades general utility via the "alignment tax", while detection-based filtering incurs prohibitive latency and memory costs. To bridge this gap, we propose RedVisor, a unified framework that synthesizes the explainability of detection systems with the seamless integration of prevention strategies. To the best of our knowledge, RedVisor is the first approach to leverage fine-grained reasoning paths to simultaneously detect attacks and guide the model's safe response. We implement this via a lightweight, removable adapter positioned atop the frozen backbone. This adapter serves a dual function: it first generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Explainable Artificial Intelligence (XAI)
