What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors
Yi-Shan Lin, Wen-Chuan Lee, Z. Berkay Celik

TL;DR
This paper proposes using backdoor trigger patterns as ground truth to automate and improve the evaluation of XAI methods' interpretability, revealing limitations of current approaches in identifying important input regions.
Contribution
It introduces a novel backdoor-based evaluation framework and metrics for assessing XAI explanations, demonstrating its effectiveness across multiple models and explanation methods.
Findings
Model-free methods outperform local explanation methods in identifying trigger regions.
Six explanation methods fail to fully highlight backdoor triggers.
Backdoor triggers serve as ground truth for evaluating explanation relevance.
Abstract
EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs through model saliency explanations that highlight the parts of the inputs deemed important to arrive a decision at a specific target. However, it remains challenging to quantify correctness of their interpretability as current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation. In this paper, we propose backdoor trigger patterns--hidden malicious functionalities that cause misclassification--to automate the evaluation of saliency explanations. Our key observation is that triggers provide ground truth for inputs to evaluate whether the regions identified by an XAI method are truly relevant to its output. Since backdoor triggers are the most important features that cause deliberate misclassification, a robust XAI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
MethodsInterpretability
