Robustness of Explanation Methods for NLP Models

Shriya Atmakuri; Tejas Chheda; Dinesh Kandula; Nishant Yadav; Taesung; Lee; Hessel Tuinhof

arXiv:2206.12284·cs.CL·June 27, 2022·1 cites

Robustness of Explanation Methods for NLP Models

Shriya Atmakuri, Tejas Chheda, Dinesh Kandula, Nishant Yadav, Taesung, Lee, Hessel Tuinhof

PDF

Open Access

TL;DR

This paper investigates the robustness of explanation methods for NLP models, revealing their vulnerability to adversarial attacks that can significantly distort explanations with minimal input changes.

Contribution

It is the first study to evaluate the adversarial robustness of explanation methods in NLP, providing initial insights and demonstrating their susceptibility to small input perturbations.

Findings

01

Explanation methods can be disrupted in up to 86% of cases with minor input changes.

02

First evaluation of adversarial robustness of explanation methods in text.

03

Small semantic modifications can significantly alter explanations.

Abstract

Explanation methods have emerged as an important tool to highlight the features responsible for the predictions of neural networks. There is mounting evidence that many explanation methods are rather unreliable and susceptible to malicious manipulations. In this paper, we particularly aim to understand the robustness of explanation methods in the context of text modality. We provide initial insights and results towards devising a successful adversarial attack against text explanations. To our knowledge, this is the first attempt to evaluate the adversarial robustness of an explanation method. Our experiments show the explanation method can be largely disturbed for up to 86% of the tested samples with small changes in the input sentence and its semantics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks