Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems
Minseok Kim, Hankook Lee, Hyungjoon Koo

TL;DR
This paper introduces RAGDefender, a resource-efficient post-retrieval defense mechanism for RAG systems that effectively detects and filters adversarial content, significantly reducing attack success rates without additional training costs.
Contribution
RAGDefender is a novel lightweight defense method operating during post-retrieval, outperforming existing defenses in efficiency and effectiveness against knowledge corruption attacks in RAG systems.
Findings
RAGDefender reduces attack success rate from 0.89 to 0.02 on Gemini model.
It outperforms RobustRAG and Discern-and-Answer in adversarial scenarios.
Operates without additional training or inference costs.
Abstract
Large language models (LLMs) are reshaping numerous facets of our daily lives, leading widespread adoption as web-based services. Despite their versatility, LLMs face notable challenges, such as generating hallucinated content and lacking access to up-to-date information. Lately, to address such limitations, Retrieval-Augmented Generation (RAG) has emerged as a promising direction by generating responses grounded in external knowledge sources. A typical RAG system consists of i) a retriever that probes a group of relevant passages from a knowledge base and ii) a generator that formulates a response based on the retrieved content. However, as with other AI systems, recent studies demonstrate the vulnerability of RAG, such as knowledge corruption attacks by injecting misleading information. In response, several defense strategies have been proposed, including having LLMs inspect the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks
