On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains

Xun Xian; Ganghua Wang; Xuan Bi; Jayanth Srinivasa; Ashish Kundu; Charles Fleming; Mingyi Hong; Jie Ding

arXiv:2409.17275·cs.CR·June 2, 2025

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains

Xun Xian, Ganghua Wang, Xuan Bi, Jayanth Srinivasa, Ashish Kundu, Charles Fleming, Mingyi Hong, Jie Ding

PDF

Open Access 1 Video

TL;DR

This paper reveals that retrieval-augmented generation systems are vulnerable to universal poisoning attacks in knowledge-intensive domains, and proposes an effective detection-based defense to mitigate this risk.

Contribution

The study uncovers a novel vulnerability in RAG systems related to poisoned document retrieval and introduces a new detection method to enhance their robustness.

Findings

01

Retrieval systems are susceptible to universal poisoning attacks in medical Q&A.

02

Poisoned documents can be accurately retrieved using attacker-specified queries.

03

The proposed detection method achieves high detection rates across various domains.

Abstract

Retrieval-Augmented Generation (RAG) has been empirically shown to enhance the performance of large language models (LLMs) in knowledge-intensive domains such as healthcare, finance, and legal contexts. Given a query, RAG retrieves relevant documents from a corpus and integrates them into the LLMs' generation process. In this study, we investigate the adversarial robustness of RAG, focusing specifically on examining the retrieval system. First, across 225 different setup combinations of corpus, retriever, query, and targeted information, we show that retrieval systems are vulnerable to universal poisoning attacks in medical Q\&A. In such attacks, adversaries generate poisoned documents containing a broad spectrum of targeted information, such as personally identifiable information. When these poisoned documents are inserted into a corpus, they can be accurately retrieved by any users,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains· slideslive

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsAttention Is All You Need · Attention Dropout · WordPiece · Linear Warmup With Linear Decay · Linear Layer · Weight Decay · Byte Pair Encoding · BERT · Softmax · Dropout