Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems
Scott Thornton

TL;DR
This paper investigates how adversaries can poison retrieval corpora in RAG systems to manipulate outputs, and proposes a hybrid retrieval defense that effectively mitigates such attacks without retraining the models.
Contribution
It introduces gradient-guided corpus poisoning attacks against RAG systems and demonstrates a simple hybrid retrieval method that significantly reduces attack success.
Findings
Gradient-guided poisoning achieves 38% co-retrieval rate.
Hybrid retrieval reduces attack success from 38% to 0%.
Hybrid retrieval partially circumvents joint optimization attacks, with 20-44% success.
Abstract
Retrieval-Augmented Generation (RAG) systems extend large language models (LLMs) with external knowledge sources but introduce new attack surfaces through the retrieval pipeline. In particular, adversaries can poison retrieval corpora so that malicious documents are preferentially retrieved at inference time, enabling targeted manipulation of model outputs. We study gradient-guided corpus poisoning attacks against modern RAG pipelines and evaluate retrieval-layer defenses that require no modification to the underlying LLM. We implement dual-document poisoning attacks consisting of a sleeper document and a trigger document optimized using Greedy Coordinate Gradient (GCG). In a large-scale evaluation on the Security Stack Exchange corpus (67,941 documents) with 50 attack attempts, gradient-guided poisoning achieves a 38.0 percent co-retrieval rate under pure vector retrieval. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks
