Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations
Julia El Zini, and Mariette Awad

TL;DR
This paper evaluates contrastive textual explanations, focusing on their faithfulness and robustness, by extending metrics and benchmarking methods like POLYJUICE and MiCE on sentiment analysis data.
Contribution
It introduces a novel evaluation scheme for contrastive explanations in text, extending metrics like proximity, connectedness, and stability, and performs the first semantic adversarial attack on textual recourse methods.
Findings
POLYJUICE produces more attainable contrastive texts
Connectedness of counterfactuals varies across models
POLYJUICE demonstrates robustness in adversarial settings
Abstract
Contrastive explanation methods go beyond transparency and address the contrastive aspect of explanations. Such explanations are emerging as an attractive option to provide actionable change to scenarios adversely impacted by classifiers' decisions. However, their extension to textual data is under-explored and there is little investigation on their vulnerabilities and limitations. This work motivates textual counterfactuals by laying the ground for a novel evaluation scheme inspired by the faithfulness of explanations. Accordingly, we extend the computation of three metrics, proximity,connectedness and stability, to textual data and we benchmark two successful contrastive methods, POLYJUICE and MiCE, on our suggested metrics. Experiments on sentiment analysis data show that the connectedness of counterfactuals to their original counterparts is not obvious in both models. More…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science
MethodsCounterfactuals Explanations
