Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation

Kennedy Edemacu; Vinay M. Shashidhar; Micheal Tuape; Dan Abudu; Beakcheol Jang; Jong Wook Kim

arXiv:2508.02835·cs.LG·March 30, 2026

Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation

Kennedy Edemacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, Jong Wook Kim

PDF

TL;DR

This paper addresses the vulnerability of Retrieval-Augmented Generation systems to knowledge poisoning attacks and proposes novel filtering methods to defend against adversarial data injection, maintaining system performance.

Contribution

It introduces FilterRAG and ML-FilterRAG, new defense techniques that effectively detect and filter adversarial texts in knowledge sources for RAG models.

Findings

01

The proposed methods effectively mitigate PoisonedRAG attacks.

02

Evaluation shows close performance to original RAG systems.

03

New property to differentiate adversarial from clean texts.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to boost the capabilities of large language models (LLMs) by incorporating external, up-to-date knowledge sources. However, this introduces a potential vulnerability to knowledge poisoning attacks, where attackers can compromise the knowledge source to mislead the generation model. One such attack is the PoisonedRAG in which the injected adversarial texts steer the model to generate an attacker-chosen response to a target question. In this work, we propose novel defense methods, FilterRAG and ML-FilterRAG, to mitigate the PoisonedRAG attack. First, we propose a new property to uncover distinct properties to differentiate between adversarial and clean texts in the knowledge data source. Next, we employ this property to filter out adversarial texts from clean ones in the design of our proposed approaches. Evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.