The Neglected Baseline in Model Interpretation
Yongjin Cui, Xiaohui Fan

TL;DR
This paper highlights the importance of baselines in model interpretation, unifies various gradient-based methods, and proposes an improved interpretation method with better baseline handling.
Contribution
It reformulates model interpretation to emphasize baselines, unifies existing methods, and introduces a new approach with enhanced baseline selection for more accurate explanations.
Findings
Revised IG with a clear baseline outperforms previous methods.
Evaluation based on attribution error improves interpretation quality.
Interpretations from different layers reveal feature extraction stages.
Abstract
We observe that existing model interpretation methods generally ignore the baseline, and such neglect often results in imprecise or even incorrect interpretation. In this paper, we reformulate the task of model interpretation and the interpretation principles for model interpretation results to demonstrate the importance of the baseline. We further unify gradient-based methods, Integrated Gradients (IG) methods, and Taylor expansion, clarifying the connections among them and explicitly identifying the baseline for each method. On this basis, we analyze the flaws and errors in related model interpretation methods (IG, LayerCAM, ODAM, Difference Map). We advocate evaluating the quality of model interpretation results precisely through the attribution error between the attribution result and the attribution target, rather than adopting flawed evaluation methods, such as those based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
