RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis

Xue Tan; Hao Luan; Mingyu Luo; Xiaoyan Sun; Ping Chen; Jun Dai

arXiv:2411.18948·cs.CR·September 1, 2025

RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis

Xue Tan, Hao Luan, Mingyu Luo, Xiaoyan Sun, Ping Chen, Jun Dai

PDF

Open Access

TL;DR

RevPRAG is a novel detection method that uses LLM activation patterns to identify poisoning attacks in retrieval-augmented generation systems, significantly improving detection accuracy.

Contribution

This work introduces RevPRAG, the first automated detection pipeline leveraging LLM activations to identify poisoned responses in RAG systems.

Findings

01

Achieves 98% true positive rate in detecting poisoned responses

02

Maintains false positive rate close to 1%

03

Effective across multiple datasets and RAG architectures

Abstract

Retrieval-Augmented Generation (RAG) enriches the input to LLMs by retrieving information from the relevant knowledge database, enabling them to produce responses that are more accurate and contextually appropriate. It is worth noting that the knowledge database, being sourced from publicly available channels such as Wikipedia, inevitably introduces a new attack surface. RAG poisoning involves injecting malicious texts into the knowledge database, ultimately leading to the generation of the attacker's target response (also called poisoned response). However, there are currently limited methods available for detecting such poisoning attacks. We aim to bridge the gap in this work. Particularly, we introduce RevPRAG, a flexible and automated detection pipeline that leverages the activations of LLMs for poisoned response detection. Our investigation uncovers distinct patterns in LLMs'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPlant-based Medicinal Research · Pesticide Residue Analysis and Safety

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Linear Warmup With Linear Decay · Linear Layer · Layer Normalization · WordPiece · Attention Dropout · Multi-Head Attention · Byte Pair Encoding