TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in   Large Language Models

Pengzhou Cheng; Yidong Ding; Tianjie Ju; Zongru Wu; Wei Du; Ping Yi,; Zhuosheng Zhang; Gongshen Liu

arXiv:2405.13401·cs.CR·July 9, 2024·5 cites

TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models

Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Ping Yi,, Zhuosheng Zhang, Gongshen Liu

PDF

Open Access 1 Repo

TL;DR

TrojanRAG introduces a novel backdoor attack method on Retrieval-Augmented Generation models, demonstrating how structured context manipulation can pose significant security threats without impairing normal retrieval functions.

Contribution

The paper presents TrojanRAG, a new backdoor attack framework that exploits retrieval-augmented generation models using contrastive learning and structured data to enhance attack effectiveness.

Findings

01

TrojanRAG can manipulate LLM outputs in targeted scenarios.

02

The attack maintains normal retrieval performance.

03

It poses significant security risks across various NLP tasks.

Abstract

Large language models (LLMs) have raised concerns about potential security threats despite performing significantly in Natural Language Processing (NLP). Backdoor attacks initially verified that LLM is doing substantial harm at all stages, but the cost and robustness have been criticized. Attacking LLMs is inherently risky in security review, while prohibitively expensive. Besides, the continuous iteration of LLMs will degrade the robustness of backdoors. In this paper, we propose TrojanRAG, which employs a joint backdoor attack in the Retrieval-Augmented Generation, thereby manipulating LLMs in universal attack scenarios. Specifically, the adversary constructs elaborate target contexts and trigger sets. Multiple pairs of backdoor shortcuts are orthogonally optimized by contrastive learning, thus constraining the triggering conditions to a parameter subspace to improve the matching. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

charles-ydd/trojanrag
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Linear Layer · Multi-Head Attention · Residual Connection · Weight Decay · Byte Pair Encoding