Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
Wentao Zhang, Yan Zhuang, ZhuHang Zheng, Mingfei Zhang, Jiawen Deng, Fuji Ren

TL;DR
This paper introduces DEJA, a black-box attack that subtly degrades retrieval-augmented generation systems by inducing non-informative yet fluent responses, highlighting a new type of stealthy threat.
Contribution
It formalizes the concept of soft failure in RAG systems and proposes DEJA, an evolutionary attack framework that effectively induces low-utility responses while remaining stealthy.
Findings
DEJA achieves over 79% success in inducing soft failures.
It maintains low hard-failure rates below 15%.
Adversarial documents evade detection and transfer across models.
Abstract
Existing jamming attacks on Retrieval-Augmented Generation (RAG) systems typically induce explicit refusals or denial-of-service behaviors, which are conspicuous and easy to detect. In this work, we formalize a subtler availability threat, termed soft failure, which degrades system utility by inducing fluent and coherent yet non-informative responses rather than overt failures. We propose Deceptive Evolutionary Jamming Attack (DEJA), an automated black-box attack framework that generates adversarial documents to trigger such soft failures by exploiting safety-aligned behaviors of large language models. DEJA employs an evolutionary optimization process guided by a fine-grained Answer Utility Score (AUS), computed via an LLM-based evaluator, to systematically degrade the certainty of answers while maintaining high retrieval success. Extensive experiments across multiple RAG configurations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
