On Model Explanations with Transferable Neural Pathways
Xinmiao Lin, Wentao Bao, Qi Yu, Yu Kong

TL;DR
This paper introduces a new method for explaining neural network models by generating class-relevant, faithful, and interpretable neural pathways that are both sparse and consistent with class information.
Contribution
It proposes the GEN-CNP model that learns to generate class-relevant neural pathways with instance-specific sparsity, improving interpretability over existing methods.
Findings
GEN-CNP effectively predicts neural pathways with high faithfulness.
Class-relevant pathways show high similarity within the same class.
Pathways enhance interpretability and faithfully explain model behavior.
Abstract
Neural pathways as model explanations consist of a sparse set of neurons that provide the same level of prediction performance as the whole model. Existing methods primarily focus on accuracy and sparsity but the generated pathways may offer limited interpretability thus fall short in explaining the model behavior. In this paper, we suggest two interpretability criteria of neural pathways: (i) same-class neural pathways should primarily consist of class-relevant neurons; (ii) each instance's neural pathway sparsity should be optimally determined. To this end, we propose a Generative Class-relevant Neural Pathway (GEN-CNP) model that learns to predict the neural pathways from the target model's feature maps. We propose to learn class-relevant information from features of deep and shallow layers such that same-class neural pathways exhibit high similarity. We further impose a faithfulness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Materials Science
MethodsFocus
