ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

Landon Butler; Abhineet Agarwal; Justin Singh Kang; Yigit Efe Erginbas; Bin Yu; Kannan Ramchandran

arXiv:2505.17495·cs.LG·October 27, 2025

ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

Landon Butler, Abhineet Agarwal, Justin Singh Kang, Yigit Efe Erginbas, Bin Yu, Kannan Ramchandran

PDF

TL;DR

ProxySPEX introduces a hierarchical, inference-efficient method for interpreting large language models by identifying sparse feature interactions, significantly reducing computational costs while improving interpretability accuracy.

Contribution

It leverages the hierarchical nature of feature interactions in LLMs to enable scalable, accurate interaction attribution with fewer inferences than previous methods.

Findings

01

More faithful output reconstruction by 20% over marginal attribution

02

Uses 10x fewer inferences than SPEX

03

Effectively identifies influential feature interactions in high-dimensional data

Abstract

Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features. To identify these interactions, most existing approaches require enumerating all possible combinations of features up to a given order, causing them to scale poorly with the number of inputs $n$ . Recently, Kang et al. (2025) proposed SPEX, an information-theoretic approach that uses interaction sparsity to scale to $n \approx 1 0^{3}$ features. SPEX greatly improves upon prior methods but requires tens of thousands of model inferences, which can be prohibitive for large models. In this paper, we observe that LLM feature interactions are often hierarchical -- higher-order interactions are accompanied by their lower-order subsets -- which enables more efficient discovery. To exploit this hierarchy, we propose ProxySPEX, an interaction attribution algorithm that first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.