AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out   Context Attribution

Fengyuan Liu; Nikhil Kandpal; Colin Raffel

arXiv:2411.15102·cs.LG·March 24, 2025

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

Fengyuan Liu, Nikhil Kandpal, Colin Raffel

PDF

Open Access 1 Repo

TL;DR

AttriBoT introduces a set of techniques that significantly accelerate leave-one-out context attribution for large language models, enabling scalable and faithful interpretability with over 300 times speedup.

Contribution

The paper presents novel methods for efficiently approximating LOO error in context attribution, combining caching, hierarchical attribution, and proxy models for large LLMs.

Findings

01

Achieves over 300x speedup in computing context attributions.

02

Provides more faithful LOO error approximation than prior methods.

03

Enables attributions to be computed 30x faster than generating responses.

Abstract

The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, which measures the change in the likelihood of the LLM's response when a given span of the context is removed, provides a principled way to perform context attribution, but can be prohibitively expensive to compute for large models. In this work, we introduce AttriBoT, a series of novel techniques for efficiently computing an approximation of the LOO error for context attribution. Specifically, AttriBoT uses cached activations to avoid redundant operations, performs hierarchical attribution to reduce computation, and emulates the behavior of large target models with smaller proxy models. Taken together, AttriBoT can provide a >300x speedup while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

r-three/AttriBoT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Data Management and Algorithms · Human Pose and Action Recognition