Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems

Minseok Kim; Hankook Lee; Hyungjoon Koo

arXiv:2511.01268·cs.CR·November 4, 2025

Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems

Minseok Kim, Hankook Lee, Hyungjoon Koo

PDF

Open Access

TL;DR

This paper introduces RAGDefender, a resource-efficient post-retrieval defense mechanism for RAG systems that effectively detects and filters adversarial content, significantly reducing attack success rates without additional training costs.

Contribution

RAGDefender is a novel lightweight defense method operating during post-retrieval, outperforming existing defenses in efficiency and effectiveness against knowledge corruption attacks in RAG systems.

Findings

01

RAGDefender reduces attack success rate from 0.89 to 0.02 on Gemini model.

02

It outperforms RobustRAG and Discern-and-Answer in adversarial scenarios.

03

Operates without additional training or inference costs.

Abstract

Large language models (LLMs) are reshaping numerous facets of our daily lives, leading widespread adoption as web-based services. Despite their versatility, LLMs face notable challenges, such as generating hallucinated content and lacking access to up-to-date information. Lately, to address such limitations, Retrieval-Augmented Generation (RAG) has emerged as a promising direction by generating responses grounded in external knowledge sources. A typical RAG system consists of i) a retriever that probes a group of relevant passages from a knowledge base and ii) a generator that formulates a response based on the retrieved content. However, as with other AI systems, recent studies demonstrate the vulnerability of RAG, such as knowledge corruption attacks by injecting misleading information. In response, several defense strategies have been proposed, including having LLMs inspect the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks