TL;DR
This paper introduces trend-based faithfulness tests for local explanation methods, demonstrating their superiority over traditional tests in assessing explanations of complex data across various tasks, thereby enhancing model debugging.
Contribution
It proposes novel trend-based faithfulness tests that improve the evaluation of explanation methods, especially for complex data, and demonstrates their effectiveness through extensive empirical evaluation.
Findings
Traditional faithfulness tests are dominated by randomness, especially on complex data.
Trend-based tests better assess explanation faithfulness across image, language, and security tasks.
Model debugging benefits significantly from the improved faithfulness evaluation.
Abstract
While enjoying the great achievements brought by deep learning (DL), people are also worried about the decision made by DL models, since the high degree of non-linearity of DL models makes the decision extremely difficult to understand. Consequently, attacks such as adversarial attacks are easy to carry out, but difficult to detect and explain, which has led to a boom in the research on local explanation methods for explaining model decisions. In this paper, we evaluate the faithfulness of explanation methods and find that traditional tests on faithfulness encounter the random dominance problem, \ie, the random selection performs the best, especially for complex data. To further solve this problem, we propose three trend-based faithfulness tests and empirically demonstrate that the new trend tests can better assess faithfulness than traditional tests on image, natural language and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
