AdversarialCoT: Single-Document Retrieval Poisoning for LLM Reasoning
Hongru Song, Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

TL;DR
This paper introduces AdversarialCoT, a targeted attack method that poisons a single document in retrieval-augmented LLM systems, significantly impairing reasoning accuracy by exploiting vulnerabilities.
Contribution
It presents a novel query-specific poisoning attack that refines a single document to expose reasoning weaknesses in LLMs within RAG systems.
Findings
A single adversarial document can substantially reduce LLM reasoning accuracy.
AdversarialCoT effectively uncovers subtle vulnerabilities in retrieval-augmented LLMs.
The method demonstrates the security risks inherent in current RAG systems.
Abstract
Retrieval-augmented generation (RAG) enhances large language model (LLM) reasoning by retrieving external documents, but also opens up new attack surfaces. We study knowledge-base poisoning attacks in RAG, where an attacker injects malicious content into the retrieval corpus, which is then naturally surfaced by the retriever and consumed by the LLM during reasoning. Unlike prior work that floods the corpus with poisoned documents, we propose AdversarialCoT, a query-specific attack that poisons only a single document in the corpus. AdversarialCoT first extracts the target LLM's reasoning framework to guide the construction of an initial adversarial chain-of-thought (CoT). The adversarial document is iteratively refined through interactions with the LLM, progressively exposing and exploiting critical reasoning vulnerabilities. Experiments on benchmark LLMs show that a single adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
