Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification
Ruixuan Tang, Hanjie Chen, Yangfeng Ji

TL;DR
This paper investigates whether instability in post-hoc explanations for neural text classifiers stems from the models or the explanation methods, finding models are likely the primary source of fragility.
Contribution
The paper introduces an output probability perturbation method to isolate the source of explanation instability, demonstrating neural models are the main cause.
Findings
Post-hoc explanations are stable under output probability perturbations.
Neural network models are the primary source of explanation fragility.
The proposed method effectively isolates the explanation method's influence.
Abstract
Some recent works observed the instability of post-hoc explanations when input side perturbations are applied to the model. This raises the interest and concern in the stability of post-hoc explanations. However, the remaining question is: is the instability caused by the neural network model or the post-hoc explanation method? This work explores the potential source that leads to unstable post-hoc explanations. To separate the influence from the model, we propose a simple output probability perturbation method. Compared to prior input side perturbation methods, the output probability perturbation method can circumvent the neural model's potential effect on the explanations and allow the analysis on the explanation method. We evaluate the proposed method with three widely-used post-hoc explanation methods (LIME (Ribeiro et al., 2016), Kernel Shapley (Lundberg and Lee, 2017a), and Sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Radiomics and Machine Learning in Medical Imaging
