MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks

Tailun Chen; Yu He; Yan Wang; Shuo Shao; Haolun Zheng; Zhihao Liu; Jinfeng Li; Zhizhen Qin; Yuefeng Chen; Zhixuan Chu; Zhan Qin; Kui Ren

arXiv:2512.08289·cs.CR·January 21, 2026

MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks

Tailun Chen, Yu He, Yan Wang, Shuo Shao, Haolun Zheng, Zhihao Liu, Jinfeng Li, Zhizhen Qin, Yuefeng Chen, Zhixuan Chu, Zhan Qin, Kui Ren

PDF

Open Access

TL;DR

This paper introduces MIRAGE, a novel black-box poisoning attack on RAG systems that effectively manipulates external knowledge sources to mislead language models without requiring white-box access or knowledge of user queries.

Contribution

MIRAGE is a new multi-stage poisoning pipeline that operates in black-box, query-agnostic settings, using surrogate feedback and innovative techniques to enhance attack success and stealth.

Findings

01

MIRAGE outperforms existing attacks in effectiveness and stealth.

02

The attack demonstrates high transferability across different models.

03

Extensive experiments confirm MIRAGE's ability to manipulate RAG systems effectively.

Abstract

Retrieval-Augmented Generation (RAG) systems enhance LLMs with external knowledge but introduce a critical attack surface: corpus poisoning. While recent studies have demonstrated the potential of such attacks, they typically rely on impractical assumptions, such as white-box access or known user queries, thereby underestimating the difficulty of real-world exploitation. In this paper, we bridge this gap by proposing MIRAGE, a novel multi-stage poisoning pipeline designed for strict black-box and query-agnostic environments. Operating on surrogate model feedback, MIRAGE functions as an automated optimization framework that integrates three key mechanisms: it utilizes persona-driven query synthesis to approximate latent user search distributions, employs semantic anchoring to imperceptibly embed these intents for high retrieval visibility, and leverages an adversarial variant of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Adversarial Robustness in Machine Learning