RAGPart & RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation
Pankayaraj Pathmanathan, Michael-Andrei Panaitescu-Liess, Cho-Yu Jason Chiang, Furong Huang

TL;DR
This paper introduces RAGPart and RAGMask, two lightweight retrieval-stage defenses that protect retrieval-augmented generation models from corpus poisoning attacks by detecting and mitigating malicious documents.
Contribution
The paper presents novel, computationally efficient defenses operating at the retrieval stage, improving robustness of RAG models against corpus poisoning without altering the generation component.
Findings
Defenses significantly reduce attack success rates across multiple benchmarks.
RAGPart and RAGMask maintain utility under benign conditions.
The defenses are effective against various poisoning strategies and retriever types.
Abstract
Retrieval-Augmented Generation (RAG) has emerged as a promising paradigm to enhance large language models (LLMs) with external knowledge, reducing hallucinations and compensating for outdated information. However, recent studies have exposed a critical vulnerability in RAG pipelines corpus poisoning where adversaries inject malicious documents into the retrieval corpus to manipulate model outputs. In this work, we propose two complementary retrieval-stage defenses: RAGPart and RAGMask. Our defenses operate directly on the retriever, making them computationally lightweight and requiring no modification to the generation model. RAGPart leverages the inherent training dynamics of dense retrievers, exploiting document partitioning to mitigate the effect of poisoned points. In contrast, RAGMask identifies suspicious tokens based on significant similarity shifts under targeted token masking.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
