FlippedRAG: Black-Box Opinion Manipulation Adversarial Attacks to Retrieval-Augmented Generation Models

Zhuo Chen; Yuyang Gong; Jiawei Liu; Miaokun Chen; Haotan Liu; Qikai Cheng; Fan Zhang; Wei Lu; Xiaozhong Liu

arXiv:2501.02968·cs.IR·December 29, 2025

FlippedRAG: Black-Box Opinion Manipulation Adversarial Attacks to Retrieval-Augmented Generation Models

Zhuo Chen, Yuyang Gong, Jiawei Liu, Miaokun Chen, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu

PDF

Open Access

TL;DR

This paper introduces FlippedRAG, a transfer-based black-box adversarial attack method that manipulates retrieval-augmented generation models to alter opinions on controversial topics, revealing significant security vulnerabilities.

Contribution

We develop a novel attack framework that reverse-engineers the retriever and crafts poisoning triggers, demonstrating substantial effectiveness against black-box RAG models.

Findings

01

FlippedRAG increases attack success rate by 16.7%.

02

It causes a 50% shift in opinion polarity of generated responses.

03

Existing defenses are ineffective against FlippedRAG.

Abstract

Retrieval-Augmented Generation (RAG) enriches LLMs by dynamically retrieving external knowledge, reducing hallucinations and satisfying real-time information needs. While existing research mainly targets RAG's performance and efficiency, emerging studies highlight critical security concerns. Yet, current adversarial approaches remain limited, mostly addressing white-box scenarios or heuristic black-box attacks without fully investigating vulnerabilities in the retrieval phase. Additionally, prior works mainly focus on factoid Q&A tasks, their attacks lack complexity and can be easily corrected by advanced LLMs. In this paper, we investigate a more realistic and critical threat scenario: adversarial attacks intended for opinion manipulation against black-box RAG models, particularly on controversial topics. Specifically, we propose FlippedRAG, a transfer-based adversarial attack against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Focus · Layer Normalization · Dense Connections · Attention Dropout · Softmax · Byte Pair Encoding · Linear Warmup With Linear Decay · WordPiece · Linear Layer