Attributions All the Way Down? The Metagame of Interpretability

Hubert Baniecki; Przemyslaw Biecek; Fabian Fumagalli

arXiv:2605.06295·cs.LG·May 8, 2026

Attributions All the Way Down? The Metagame of Interpretability

Hubert Baniecki, Przemyslaw Biecek, Fabian Fumagalli

PDF

TL;DR

This paper introduces the metagame framework to quantify second-order interaction effects of model explanations, providing hierarchical decomposition and empirical insights across various interpretability applications.

Contribution

It presents a novel metagame approach that measures directional influence among features in attribution methods, extending existing interaction indices with theoretical and empirical validation.

Findings

01

Hierarchical decomposition of attributions into meta-attributions.

02

Meta-attributions serve as directional extensions of interaction indices.

03

Empirical applications include token interactions, cross-modal similarity, and multimodal concept interpretation.

Abstract

We introduce the metagame, a conceptual framework for quantifying second-order interaction effects of model explanations. For any first-order attribution $ϕ (f)$ explaining a model $f$ , we measure the directional influence of feature $j$ on the attribution of feature $i$ , denoted as meta-attribution $φ_{j \to i} (f)$ , by treating the attribution method itself as a cooperative game and computing its Shapley value. Theoretically, we prove that attributions hierarchically decompose into meta-attributions, and establish these as directional extensions of existing interaction indices. Empirically, we demonstrate that the metagame delivers insights across diverse interpretability applications: (i) quantifying token interactions in instruction-tuned language models, (ii) explaining cross-modal similarity in vision-language encoders, and (iii) interpreting text-to-image concepts in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.