SPEX: Scaling Feature Interaction Explanations for LLMs

Justin Singh Kang; Landon Butler; Abhineet Agarwal; Yigit Efe; Erginbas; Ramtin Pedarsani; Kannan Ramchandran; Bin Yu

arXiv:2502.13870·cs.LG·February 20, 2025

SPEX: Scaling Feature Interaction Explanations for LLMs

Justin Singh Kang, Landon Butler, Abhineet Agarwal, Yigit Efe, Erginbas, Ramtin Pedarsani, Kannan Ramchandran, Bin Yu

PDF

Open Access 1 Repo 1 Video

TL;DR

SPEX is a scalable, model-agnostic explanation method that efficiently identifies feature interactions in large language models, outperforming existing methods on long-input datasets and aligning with human annotations.

Contribution

SPEX introduces a novel sparse Fourier transform-based approach to scale interaction attributions for LLMs to large input lengths, leveraging natural data sparsity.

Findings

01

SPEX outperforms marginal attribution methods by up to 20% on large inputs.

02

SPEX accurately identifies key features and interactions influencing model outputs.

03

SPEX's explanations align with human annotations and reveal reasoning patterns in LLMs.

Abstract

Large language models (LLMs) have revolutionized machine learning due to their ability to capture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide marginal feature attributions, while their extensions to interaction importances only scale to small input lengths ( $\approx 20$ ). We propose Spectral Explainer (SPEX), a model-agnostic interaction attribution algorithm that efficiently scales to large input lengths ( $\approx 1000)$ . SPEX exploits underlying natural sparsity among interactions -- common in real-world data -- and applies a sparse Fourier transform using a channel decoding algorithm to efficiently identify important interactions. We perform experiments across three difficult long-context datasets that require LLMs to utilize interactions between inputs to complete the task. For large inputs, SPEX outperforms marginal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

basics-lab/spectral-explain
noneOfficial

Videos

SPEX: Scaling Feature Interaction Explanations for LLMs· slideslive

Taxonomy

TopicsScientific Computing and Data Management · Natural Language Processing Techniques · Topic Modeling

MethodsShapley Additive Explanations · ALIGN