Delta-Audit: Explaining What Changes When Models Change
Arshia Hemmat, Afsaneh Fatemi

TL;DR
Delta-Attribution is a model-agnostic framework that explains the specific feature-level changes between model versions, helping to interpret why performance shifts occur after model updates.
Contribution
The paper introduces Delta-Attribution, a novel method for differencing feature attributions to explain model changes across versions.
Findings
Inductive-bias changes cause large, behavior-aligned deltas.
Cosmetic tweaks show minimal attribution differences.
Deeper Gradient Boosting models exhibit significant redistribution.
Abstract
Model updates (new hyperparameters, kernels, depths, solvers, or data) change performance, but the \emph{reason} often remains opaque. We introduce \textbf{Delta-Attribution} (\mbox{-Attribution}), a model-agnostic framework that explains \emph{what changed} between versions and by differencing per-feature attributions: . We evaluate with a \emph{-Attribution Quality Suite} covering magnitude/sparsity (L1, Top-, entropy), agreement/shift (rank-overlap@10, Jensen--Shannon divergence), behavioural alignment (Delta Conservation Error, DCE; Behaviour--Attribution Coupling, BAC; COF), and robustness (noise, baseline sensitivity, grouped occlusion). Instantiated via fast occlusion/clamping in standardized space with a class-anchored margin and baseline averaging, we audit 45 settings: five classical families…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
