DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection
Jerry Wang, Fang Yu

TL;DR
This paper introduces a black-box adversarial attack method using Differential Evolution to craft minimal prompt suffixes that can manipulate retrieval-augmented generation systems, evading detection and achieving high success rates.
Contribution
It presents a novel gradient-free optimization approach for adversarial prompts in RAG systems, demonstrating effectiveness and stealthiness in real-world scenarios.
Findings
DE-based prompt optimization achieves high attack success rates.
Adversarial suffixes are effective with <=5 tokens.
Suffixes evade detection by BERT-based detectors.
Abstract
Adversarial prompt attacks can significantly alter the reliability of Retrieval-Augmented Generation (RAG) systems by re-ranking them to produce incorrect outputs. In this paper, we present a novel method that applies Differential Evolution (DE) to optimize adversarial prompt suffixes for RAG-based question answering. Our approach is gradient-free, treating the RAG pipeline as a black box and evolving a population of candidate suffixes to maximize the retrieval rank of a targeted incorrect document to be closer to real world scenarios. We conducted experiments on the BEIR QA datasets to evaluate attack success at certain retrieval rank thresholds under multiple retrieving applications. Our results demonstrate that DE-based prompt optimization attains competitive (and in some cases higher) success rates compared to GGPP to dense retrievers and PRADA to sparse retrievers, while using only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
