Evaluating Saliency Methods for Neural Language Models
Shuoyang Ding, Philipp Koehn

TL;DR
This paper systematically evaluates various saliency methods for neural language models, assessing their interpretability quality in terms of plausibility and faithfulness across multiple datasets, highlighting the need for validation before use.
Contribution
It provides a comprehensive, quantitative comparison of saliency methods for NLP models, emphasizing the importance of validation for trustworthy interpretations.
Findings
Saliency methods often produce low-quality interpretations.
Different datasets reveal varying effectiveness of saliency methods.
Validation is crucial before deploying saliency-based explanations.
Abstract
Saliency methods are widely used to interpret neural network predictions, but different variants of saliency methods often disagree even on the interpretations of the same prediction made by the same model. In these cases, how do we identify when are these interpretations trustworthy enough to be used in analyses? To address this question, we conduct a comprehensive and quantitative evaluation of saliency methods on a fundamental category of NLP models: neural language models. We evaluate the quality of prediction interpretations from two perspectives that each represents a desirable property of these interpretations: plausibility and faithfulness. Our evaluation is conducted on four different datasets constructed from the existing human annotation of syntactic and semantic agreements, on both sentence-level and document-level. Through our evaluation, we identified various ways saliency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling
