Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

Huining Cui; Wei Liu

arXiv:2605.01782·cs.CR·May 5, 2026

Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

Huining Cui, Wei Liu

PDF

TL;DR

This paper introduces RAGCharacter, a two-pass forensic framework for character-level traceback of poisoned spans in retrieval-augmented generation, enabling finer-grained evidence auditing.

Contribution

It presents a novel prompt-conditioned, black-box character-level traceback method and an evaluation protocol for localizing poisoned evidence in RAG systems.

Findings

01

RAGCharacter outperforms baselines in localization accuracy.

02

It achieves a good balance between localization precision and over-attribution.

03

The method is effective across multiple datasets, attack types, and models.

Abstract

Retrieval-augmented generation (RAG) improves factual grounding by conditioning large language models on retrieved evidence, but it also opens a data-layer attack surface: poisoned corpus entries can steer outputs without changing model parameters. Existing defenses and traceback methods are largely passage-level, which is too coarse for modern attacks whose effective payload may be a short fabricated claim, trigger phrase, or hidden instruction embedded inside an otherwise benign chunk. We study black-box character-level poison traceback in RAG and present RAGCharacter, a two-pass forensic framework that localizes the responsible retrieved span for a concrete misgeneration event. Pass-0 runs standard RAG while logging a prompt-anchored execution trace. Pass-1 re-enters a triggered trace and performs event-conditioned traceback over prompt-used evidence via budgeted counterfactual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.