Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Josip Juki\'c, Martin Tutek, Jan \v{S}najder

TL;DR
This paper investigates how to better evaluate and improve agreement between saliency methods in neural NLP models, proposing Pearson-$r$ as a more suitable metric and linking explanation agreement to model properties and training dynamics.
Contribution
It demonstrates that Pearson-$r$ better captures agreement between saliency methods than rank correlation and shows how regularization improves explanation agreement, especially for easy instances.
Findings
Pearson-$r$ outperforms rank correlation in agreement evaluation.
Regularization techniques increase saliency explanation agreement.
Agreement varies with instance difficulty and training dynamics.
Abstract
A popular approach to unveiling the black box of neural NLP models is to leverage saliency methods, which assign scalar importance scores to each input component. A common practice for evaluating whether an interpretability method is faithful has been to use evaluation-by-agreement -- if multiple methods agree on an explanation, its credibility increases. However, recent work has found that saliency methods exhibit weak rank correlations even when applied to the same model instance and advocated for the use of alternative diagnostic methods. In our work, we demonstrate that rank correlation is not a good fit for evaluating agreement and argue that Pearson- is a better-suited alternative. We further show that regularization techniques that increase faithfulness of attention explanations also increase agreement between saliency methods. By connecting our findings to instance categories…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning in Materials Science
