RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement
Ziye Wang, Guanyu Wang, and Kailong Wang

TL;DR
RefineRAG introduces a word-level poisoning attack framework on RAG systems, combining macro seed generation and micro retrieval optimization, achieving high success rates and transferability to black-box models.
Contribution
It presents a novel holistic poisoning method that outperforms existing approaches in effectiveness and naturalness, revealing a significant security vulnerability.
Findings
Achieves 90% attack success rate on NQ dataset.
Outperforms baselines in naturalness and repetition metrics.
Successfully transfers attacks to black-box systems.
Abstract
Retrieval-Augmented Generation (RAG) significantly enhances Large Language Models (LLMs), but simultaneously exposes a critical vulnerability to knowledge poisoning attacks. Existing attack methods like PoisonedRAG remain detectable due to coarse-grained separate-and-concatenate strategies. To bridge this gap, we propose RefineRAG, a novel framework that treats poisoning as a holistic word-level refinement problem. It operates in two stages: Macro Generation produces toxic seeds guaranteed to induce target answers, while Micro Refinement employs a retriever-in-the-loop optimization to maximize retrieval priority without compromising naturalness. Evaluations on NQ and MSMARCO demonstrate that RefineRAG achieves state-of-the-art effectiveness, securing a 90% Attack Success Rate on NQ, while registering the lowest grammar errors and repetition rates among all baselines. Crucially, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
