CPA-RAG:Covert Poisoning Attacks on Retrieval-Augmented Generation in Large Language Models
Chunyang Li, Junwei Zhang, Anda Cheng, Zhuo Ma, Xinghua Li, Jianfeng Ma

TL;DR
This paper introduces CPA-RAG, a black-box poisoning attack framework that effectively manipulates retrieval-augmented generation models, achieving high success rates and exposing vulnerabilities in real-world systems.
Contribution
We propose CPA-RAG, a novel adversarial framework combining prompt-based generation and cross-guided optimization to attack RAG systems, matching white-box performance and outperforming existing baselines.
Findings
Achieves over 90% attack success rate in top-5 retrieval settings.
Outperforms existing black-box baselines by 14.5 percentage points.
Successfully compromises a commercial RAG system on Alibaba's BaiLian platform.
Abstract
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by incorporating external knowledge, but its openness introduces vulnerabilities that can be exploited by poisoning attacks. Existing poisoning methods for RAG systems have limitations, such as poor generalization and lack of fluency in adversarial texts. In this paper, we propose CPA-RAG, a black-box adversarial framework that generates query-relevant texts capable of manipulating the retrieval process to induce target answers. The proposed method integrates prompt-based text generation, cross-guided optimization through multiple LLMs, and retriever-based scoring to construct high-quality adversarial samples. We conduct extensive experiments across multiple datasets and LLMs to evaluate its effectiveness. Results show that the framework achieves over 90\% attack success when the top-k retrieval setting is 5,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
