TL;DR
AttnTrace is a novel method leveraging LLM attention weights for efficient and accurate context traceback, enhancing interpretability and detection of prompt injection in long-context LLM applications.
Contribution
It introduces AttnTrace, a new attention-based traceback technique that outperforms existing methods in accuracy and efficiency, with theoretical insights and practical applications.
Findings
AttnTrace is more accurate than state-of-the-art methods.
AttnTrace is more efficient, reducing traceback time significantly.
It improves prompt injection detection in long-context scenarios.
Abstract
Long-context large language models (LLMs), such as Gemini-2.5-Pro and Claude-Sonnet-4, are increasingly used to empower advanced AI systems, including retrieval-augmented generation (RAG) pipelines and autonomous agents. In these systems, an LLM receives an instruction along with a context--often consisting of texts retrieved from a knowledge database or memory--and generates a response that is contextually grounded by following the instruction. Recent studies have designed solutions to trace back to a subset of texts in the context that contributes most to the response generated by the LLM. These solutions have numerous real-world applications, including performing post-attack forensic analysis and improving the interpretability and trustworthiness of LLM outputs. While significant efforts have been made, state-of-the-art solutions such as TracLLM often lead to a high computation cost,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
