SPEX: Scaling Feature Interaction Explanations for LLMs
Justin Singh Kang, Landon Butler, Abhineet Agarwal, Yigit Efe, Erginbas, Ramtin Pedarsani, Kannan Ramchandran, Bin Yu

TL;DR
SPEX is a scalable, model-agnostic explanation method that efficiently identifies feature interactions in large language models, outperforming existing methods on long-input datasets and aligning with human annotations.
Contribution
SPEX introduces a novel sparse Fourier transform-based approach to scale interaction attributions for LLMs to large input lengths, leveraging natural data sparsity.
Findings
SPEX outperforms marginal attribution methods by up to 20% on large inputs.
SPEX accurately identifies key features and interactions influencing model outputs.
SPEX's explanations align with human annotations and reveal reasoning patterns in LLMs.
Abstract
Large language models (LLMs) have revolutionized machine learning due to their ability to capture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide marginal feature attributions, while their extensions to interaction importances only scale to small input lengths (). We propose Spectral Explainer (SPEX), a model-agnostic interaction attribution algorithm that efficiently scales to large input lengths (. SPEX exploits underlying natural sparsity among interactions -- common in real-world data -- and applies a sparse Fourier transform using a channel decoding algorithm to efficiently identify important interactions. We perform experiments across three difficult long-context datasets that require LLMs to utilize interactions between inputs to complete the task. For large inputs, SPEX outperforms marginal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsScientific Computing and Data Management · Natural Language Processing Techniques · Topic Modeling
MethodsShapley Additive Explanations · ALIGN
