AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption

Yanting Wang; Runpeng Geng; Ying Chen; Jinyuan Jia

arXiv:2508.03793·cs.CL·April 21, 2026

AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption

Yanting Wang, Runpeng Geng, Ying Chen, Jinyuan Jia

PDF

1 Repo

TL;DR

AttnTrace is a novel method leveraging LLM attention weights for efficient and accurate context traceback, enhancing interpretability and detection of prompt injection in long-context LLM applications.

Contribution

It introduces AttnTrace, a new attention-based traceback technique that outperforms existing methods in accuracy and efficiency, with theoretical insights and practical applications.

Findings

01

AttnTrace is more accurate than state-of-the-art methods.

02

AttnTrace is more efficient, reducing traceback time significantly.

03

It improves prompt injection detection in long-context scenarios.

Abstract

Long-context large language models (LLMs), such as Gemini-2.5-Pro and Claude-Sonnet-4, are increasingly used to empower advanced AI systems, including retrieval-augmented generation (RAG) pipelines and autonomous agents. In these systems, an LLM receives an instruction along with a context--often consisting of texts retrieved from a knowledge database or memory--and generates a response that is contextually grounded by following the instruction. Recent studies have designed solutions to trace back to a subset of texts in the context that contributes most to the response generated by the LLM. These solutions have numerous real-world applications, including performing post-attack forensic analysis and improving the interpretability and trustworthiness of LLM outputs. While significant efforts have been made, state-of-the-art solutions such as TracLLM often lead to a high computation cost,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Wang-Yanting/AttnTrace
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.