Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention
Sagie Dekel, Moshe Tennenholtz, Oren Kurland

TL;DR
This paper introduces SDAG, a sparse attention mechanism that prevents harmful cross-document interactions in RAG, significantly improving robustness against corpus knowledge poisoning attacks without requiring fine-tuning.
Contribution
The paper proposes SDAG, a novel block-sparse attention method for RAG that enhances resistance to poisoning attacks with minimal inference-time modifications.
Findings
SDAG substantially reduces attack success rates in RAG-based QA.
SDAG outperforms standard causal attention in defending against corpus poisoning.
Combining SDAG with existing defenses yields statistically significant improvements.
Abstract
Retrieval Augmented Generation (RAG) is a highly effective paradigm for keeping LLM-based responses up-to-date and reducing the likelihood of hallucinations. Yet, RAG was recently shown to be quite vulnerable to corpus knowledge poisoning: an attacker injects misleading documents to the corpus to steer an LLM's output to an undesired response. We argue that the standard causal attention mechanism in LLMs enables harmful cross-document interactions, specifically in cases of attacks. Accordingly, we introduce a novel defense approach for RAG: Sparse Document Attention RAG (SDAG). This is a block-sparse attention mechanism that disallows cross-attention between retrieved documents. SDAG requires a minimal inference-time change to the attention mask; furthermore, no fine-tuning or additional architectural changes are needed. We present an empirical evaluation of LLM-based question answering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Misinformation and Its Impacts · Adversarial Robustness in Machine Learning
