Comparing Explanations is Not Enough, Explain the Change: New Standards are Needed to Explain Behavioral Shifts in Large Language Models
Martino Ciaperoni, Marzio Di Vece, Roberto Pellungrini, Luca Pappalardo, Fosca Giannotti, Francesco Giannini

TL;DR
This paper argues that explaining behavioral shifts in large language models requires new standards and introduces a novel XAI paradigm, XAI$_\Delta$, focused on explaining the causal change between model checkpoints.
Contribution
It proposes a new explainability framework, XAI$_\Delta$, designed to explicitly explain how interventions cause behavioral shifts in models, addressing current limitations.
Findings
Preliminary experiments highlight the need for XAI$_\Delta$ in governance.
XAI$_\Delta$ provides measurable, comparable explanations of model changes.
A transition report format aids documentation and compliance.
Abstract
Large-scale foundation models exhibit \emph{behavioral shifts} when subjected to interventions such as scaling, fine-tuning, reinforcement learning with human feedback, or in-context learning. Current explainability methods are structurally ill-suited to explain these shifts, because they either treat models as static objects, as traditional eXplainable AI (XAI) approaches do, or merely compare independent explanations across different checkpoints of a model. As a result, these approaches fail to explain the functional transition between two model instances in which a certain behavior has shifted following an intervention. This gap creates significant governance risks across jurisdictions including the EU AI Act, US state legislation, and Chinese AI regulations, which require documenting causal chains for substantial system modifications. This position paper argues that explaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Artificial Intelligence in Healthcare and Education
