POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models
Yangguang Shao, Xinjie Lin, Haozheng Luo, Chengshang Hou, Gang Xiong, Jiahao Yu, Junzheng Shi

TL;DR
This paper introduces POISONCRAFT, a practical poisoning attack on retrieval-augmented generation systems that can mislead large language models by injecting fraudulent information, exposing security vulnerabilities in real-world applications.
Contribution
We propose POISONCRAFT, a novel poisoning attack on RAG systems that does not require user query access and remains effective across different models and defenses.
Findings
POISONCRAFT effectively misleads RAG models across datasets and models.
The attack transfers successfully to black-box systems.
It influences retrieval behavior and reasoning steps in LLMs.
Abstract
Large language models (LLMs) have achieved remarkable success in various domains, primarily due to their strong capabilities in reasoning and generating human-like text. Despite their impressive performance, LLMs are susceptible to hallucinations, which can lead to incorrect or misleading outputs. This is primarily due to the lack of up-to-date knowledge or domain-specific information. Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources. However, the security of RAG systems has not been thoroughly studied. In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites. Compared to existing poisoning attacks on RAG systems, our attack is more practical as it does not require access to the target user query's info or edit the user query. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
