Logic Traps in Evaluating Attribution Scores
Yiming Ju, Yuanzhe Zhang, Zhao Yang, Zhongtao Jiang, Kang Liu, Jun, Zhao

TL;DR
This paper reviews and highlights common logical errors in evaluating attribution methods for deep learning models, emphasizing the need for more reliable evaluation practices.
Contribution
It systematically identifies and demonstrates key logic traps in attribution score evaluation methods, advocating for improved evaluation reliability.
Findings
Identifies critical logic traps in current evaluation methods
Demonstrates existence of traps through experiments
Calls for focus on reducing evaluation flaws
Abstract
Modern deep learning models are notoriously opaque, which has motivated the development of methods for interpreting how deep models predict. This goal is usually approached with attribution method, which assesses the influence of features on model predictions. As an explanation method, the evaluation criteria of attribution methods is how accurately it re-reflects the actual reasoning process of the model (faithfulness). Meanwhile, since the reasoning process of deep models is inaccessible, researchers design various evaluation methods to demonstrate their arguments. However, some crucial logic traps in these evaluation methods are ignored in most works, causing inaccurate evaluation and unfair comparison. This paper systematically reviews existing methods for evaluating attribution scores and summarizes the logic traps in these methods. We further conduct experiments to demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
