AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Fengyuan Liu, Nikhil Kandpal, Colin Raffel

TL;DR
AttriBoT introduces a set of techniques that significantly accelerate leave-one-out context attribution for large language models, enabling scalable and faithful interpretability with over 300 times speedup.
Contribution
The paper presents novel methods for efficiently approximating LOO error in context attribution, combining caching, hierarchical attribution, and proxy models for large LLMs.
Findings
Achieves over 300x speedup in computing context attributions.
Provides more faithful LOO error approximation than prior methods.
Enables attributions to be computed 30x faster than generating responses.
Abstract
The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, which measures the change in the likelihood of the LLM's response when a given span of the context is removed, provides a principled way to perform context attribution, but can be prohibitively expensive to compute for large models. In this work, we introduce AttriBoT, a series of novel techniques for efficiently computing an approximation of the LOO error for context attribution. Specifically, AttriBoT uses cached activations to avoid redundant operations, performs hierarchical attribution to reduce computation, and emulates the behavior of large target models with smaller proxy models. Taken together, AttriBoT can provide a >300x speedup while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Data Management and Algorithms · Human Pose and Action Recognition
