Measurably Stronger Explanation Reliability via Model Canonization
Franz Motzkus, Leander Weber, Sebastian Lapuschkin

TL;DR
This paper demonstrates that network canonization significantly improves the reliability of rule-based explanations for neural networks, validated through quantitative experiments on VGG-16 and ResNet18 models.
Contribution
It provides the first quantitative analysis of network canonization's effectiveness in enhancing explanation trustworthiness for modern neural architectures.
Findings
Canonization improves explanation reliability on tested models.
Quantitative validation extends best practices for neural network explanations.
Results support canonization as a key step in trustworthy AI explanations.
Abstract
While rule-based attribution methods have proven useful for providing local explanations for Deep Neural Networks, explaining modern and more varied network architectures yields new challenges in generating trustworthy explanations, since the established rule sets might not be sufficient or applicable to novel network structures. As an elegant solution to the above issue, network canonization has recently been introduced. This procedure leverages the implementation-dependency of rule-based attributions and restructures a model into a functionally identical equivalent of alternative design to which established attribution rules can be applied. However, the idea of canonization and its usefulness have so far only been explored qualitatively. In this work, we quantitatively verify the beneficial effects of network canonization to rule-based attributions on VGG-16 and ResNet18 models with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling
