On the Sensitivity and Stability of Model Interpretations in NLP

Fan Yin; Zhouxing Shi; Cho-Jui Hsieh; Kai-Wei Chang

arXiv:2104.08782·cs.CL·April 4, 2022·1 cites

On the Sensitivity and Stability of Model Interpretations in NLP

Fan Yin, Zhouxing Shi, Cho-Jui Hsieh, Kai-Wei Chang

PDF

Open Access 1 Repo

TL;DR

This paper introduces sensitivity and stability as new criteria for evaluating the faithfulness of NLP model interpretations, revealing variability in interpretation quality and proposing adversarial robustness-based methods that outperform gradient-based approaches.

Contribution

The paper proposes two novel criteria for interpretation faithfulness and develops new interpretation methods based on adversarial robustness, addressing limitations of existing approaches.

Findings

01

Interpretation faithfulness varies significantly across different criteria.

02

Proposed methods outperform gradient-based methods on new criteria.

03

Application to dependency parsing broadens the scope of interpretation evaluation.

Abstract

Recent years have witnessed the emergence of a variety of post-hoc interpretations that aim to uncover how natural language processing (NLP) models make predictions. Despite the surge of new interpretation methods, it remains an open problem how to define and quantitatively measure the faithfulness of interpretations, i.e., to what extent interpretations reflect the reasoning process by a model. We propose two new criteria, sensitivity and stability, that provide complementary notions of faithfulness to the existed removal-based criteria. Our results show that the conclusion for how faithful interpretations are could vary substantially based on different notions. Motivated by the desiderata of sensitivity and stability, we introduce a new class of interpretation methods that adopt techniques from adversarial robustness. Empirical results show that our proposed methods are effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uclanlp/nlp-interpretation-faithfulness
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques