Beyond Model Interpretability: On the Faithfulness and Adversarial   Robustness of Contrastive Textual Explanations

Julia El Zini; and Mariette Awad

arXiv:2210.08902·cs.CL·October 18, 2022

Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations

Julia El Zini, and Mariette Awad

PDF

Open Access 1 Repo

TL;DR

This paper evaluates contrastive textual explanations, focusing on their faithfulness and robustness, by extending metrics and benchmarking methods like POLYJUICE and MiCE on sentiment analysis data.

Contribution

It introduces a novel evaluation scheme for contrastive explanations in text, extending metrics like proximity, connectedness, and stability, and performs the first semantic adversarial attack on textual recourse methods.

Findings

01

POLYJUICE produces more attainable contrastive texts

02

Connectedness of counterfactuals varies across models

03

POLYJUICE demonstrates robustness in adversarial settings

Abstract

Contrastive explanation methods go beyond transparency and address the contrastive aspect of explanations. Such explanations are emerging as an attractive option to provide actionable change to scenarios adversely impacted by classifiers' decisions. However, their extension to textual data is under-explored and there is little investigation on their vulnerabilities and limitations. This work motivates textual counterfactuals by laying the ground for a novel evaluation scheme inspired by the faithfulness of explanations. Accordingly, we extend the computation of three metrics, proximity,connectedness and stability, to textual data and we benchmark two successful contrastive methods, POLYJUICE and MiCE, on our suggested metrics. Experiments on sentiment analysis data show that the connectedness of counterfactuals to their original counterparts is not obvious in both models. More…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.com/awadailab/faithful-contrastive-explanations
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science

MethodsCounterfactuals Explanations