Delta-Audit: Explaining What Changes When Models Change

Arshia Hemmat; Afsaneh Fatemi

arXiv:2508.19589·cs.LG·August 28, 2025

Delta-Audit: Explaining What Changes When Models Change

Arshia Hemmat, Afsaneh Fatemi

PDF

TL;DR

Delta-Attribution is a model-agnostic framework that explains the specific feature-level changes between model versions, helping to interpret why performance shifts occur after model updates.

Contribution

The paper introduces Delta-Attribution, a novel method for differencing feature attributions to explain model changes across versions.

Findings

01

Inductive-bias changes cause large, behavior-aligned deltas.

02

Cosmetic tweaks show minimal attribution differences.

03

Deeper Gradient Boosting models exhibit significant redistribution.

Abstract

Model updates (new hyperparameters, kernels, depths, solvers, or data) change performance, but the \emph{reason} often remains opaque. We introduce \textbf{Delta-Attribution} (\mbox{ $Δ$ -Attribution}), a model-agnostic framework that explains \emph{what changed} between versions $A$ and $B$ by differencing per-feature attributions: $Δ ϕ (x) = ϕ_{B} (x) - ϕ_{A} (x)$ . We evaluate $Δ ϕ$ with a \emph{ $Δ$ -Attribution Quality Suite} covering magnitude/sparsity (L1, Top- $k$ , entropy), agreement/shift (rank-overlap@10, Jensen--Shannon divergence), behavioural alignment (Delta Conservation Error, DCE; Behaviour--Attribution Coupling, BAC; CO $Δ$ F), and robustness (noise, baseline sensitivity, grouped occlusion). Instantiated via fast occlusion/clamping in standardized space with a class-anchored margin and baseline averaging, we audit 45 settings: five classical families…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.