Evaluating Saliency Methods for Neural Language Models

Shuoyang Ding; Philipp Koehn

arXiv:2104.05824·cs.CL·April 14, 2021·6 cites

Evaluating Saliency Methods for Neural Language Models

Shuoyang Ding, Philipp Koehn

PDF

Open Access 1 Repo

TL;DR

This paper systematically evaluates various saliency methods for neural language models, assessing their interpretability quality in terms of plausibility and faithfulness across multiple datasets, highlighting the need for validation before use.

Contribution

It provides a comprehensive, quantitative comparison of saliency methods for NLP models, emphasizing the importance of validation for trustworthy interpretations.

Findings

01

Saliency methods often produce low-quality interpretations.

02

Different datasets reveal varying effectiveness of saliency methods.

03

Validation is crucial before deploying saliency-based explanations.

Abstract

Saliency methods are widely used to interpret neural network predictions, but different variants of saliency methods often disagree even on the interpretations of the same prediction made by the same model. In these cases, how do we identify when are these interpretations trustworthy enough to be used in analyses? To address this question, we conduct a comprehensive and quantitative evaluation of saliency methods on a fundamental category of NLP models: neural language models. We evaluate the quality of prediction interpretations from two perspectives that each represents a desirable property of these interpretations: plausibility and faithfulness. Our evaluation is conducted on four different datasets constructed from the existing human annotation of syntactic and semantic agreements, on both sentence-level and document-level. Through our evaluation, we identified various ways saliency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shuoyangd/tarsius
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling