ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

Florian Eichin; Yupei Du; Philipp Mondorf; Maria Matveev; Barbara Plank; and Michael A. Hedderich

arXiv:2505.20076·cs.LG·October 2, 2025

ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

Florian Eichin, Yupei Du, Philipp Mondorf, Maria Matveev, Barbara Plank, and Michael A. Hedderich

PDF

Open Access 1 Repo 3 Reviews

TL;DR

ExPLAIND is a unified, theoretically grounded framework that combines model, data, and training attribution methods to better understand complex model behaviors and training dynamics.

Contribution

It generalizes gradient path kernels to realistic optimizers, introduces novel influence scores, and jointly interprets model components and data during training.

Findings

01

Accurately replicates CNN and Transformer models using kernel reformulation.

02

Influence scores are effective for parameter pruning.

03

Analyzes Grokking, confirming and refining its stages.

Abstract

Post-hoc interpretability methods typically attribute a model's behavior to its components, data, or training trajectory in isolation. This leads to explanations that lack a unified view and may miss key interactions. While combining existing methods or applying them at different training stages offers broader insights, such approaches usually lack theoretical support. In this work, we present ExPLAIND, a unified framework that integrates all these perspectives. First, we generalize recent work on gradient path kernels, which reformulate models trained by gradient descent as a kernel machine, to realistic settings like AdamW. We empirically validate that a CNN and a Transformer are accurately replicated by this reformulation. Second, we derive novel parameter- and step-wise influence scores from the kernel feature maps. Their effectiveness for parameter pruning is comparable to existing…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. Rigorous theoretical extension: Theorem 3.1 extends EPK from basic gradient descent to AdamW with realistic training dynamics (weight decay, first/second moment estimates, mini-batching, learning rate schedules). The mathematical derivation is sound with complete proofs in Appendix D.1. 2. Exact model representation: Unlike approximate methods, ExPLAIND achieves perfect equivalence with the original model (100% accuracy match, zero KL divergence in Table 1) when using sufficient integration s

Weaknesses

1. Gap between theoretical contribution and practical utility and scalability limitations: The EPK extension (Theorem 3.1) is mathematically sound and interesting. But the paper claims to provide a practical interpretability framework. Demonstrated only on toy problems with manual analysis and no path to scale (ResNet9 on 2-class CIFAR subset, small Transformer on algorithmic task). O(NDMO) memory complexity: N steps × D parameters × M samples × O outputs. Comput

Reviewer 02Rating 2Confidence 2

Strengths

The theorem 3.1 is new, and brings the work of Bell et al closer to realistic experimental setups. The sparsity gains in Sec 3.2 are interesting. I love to see the "Emergence of cyclic patterns in the kernel" (l413). I wonder if we expect to see similar phenomenon on other family of problems (even artificial ones) that exhibit grokking? Overall, the paper proposes an interesting line of ideas to understand training dynamics.

Weaknesses

### Scope Currently, I have issues with the motivation of the paper. While the extension of Bell et al to AdamW is interesting, I am less sure to understand the usefulness of the tool in general. The experimental section is devoted to two setups: CNN training on Cifar-2 (cats and dogs) for sparsity, and the mod 113 task used to exhibit grokking. These two tasks are rather artificial. Even grokking as a whole received recent criticism in its ability to accurately describe some phenomena (see Je

Reviewer 03Rating 8Confidence 3

Strengths

- Solid theoretical contribution: Clear extension of the Exact Path Kernel to AdamW with mini-batching, moments, and weight decay, stated as a formal theorem. - Empirical faithfulness check: The EPK reformulation matches original models’ decisions with 100 integration steps (accuracy 1.0; near-zero KL) - Unified, additive attributions: Influence tensors can be summed along axes to obtain parameter/data/step views that directly tie to predictions.

Weaknesses

- Grokking analysis is insightful but mostly qualitative and on small models/tasks; generality to larger LLMs remains unproven. - Pruning is used to “validate” importance scores rather than to deliver new SOTA compression results; comparisons are limited.

Code & Models

Repositories

mainlp/explaind
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Dense Connections · Pruning · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings