A Comparative Analysis of Counterfactual Explanation Methods for Text   Classifiers

Stephen McAleese; Mark Keane

arXiv:2411.02643·cs.CL·November 6, 2024

A Comparative Analysis of Counterfactual Explanation Methods for Text Classifiers

Stephen McAleese, Mark Keane

PDF

Open Access

TL;DR

This paper compares five counterfactual explanation methods for text classifiers, highlighting the strengths and weaknesses of white-box and LLM-based approaches, and suggests combining their advantages for better explanations.

Contribution

It provides a comprehensive evaluation of existing counterfactual explanation methods for text classifiers and offers insights into their effectiveness and limitations.

Findings

01

White-box methods effectively generate valid counterfactuals.

02

LLM-based methods produce more natural and plausible counterfactuals.

03

Combining methods could improve validity and naturalness of explanations.

Abstract

Counterfactual explanations can be used to interpret and debug text classifiers by producing minimally altered text inputs that change a classifier's output. In this work, we evaluate five methods for generating counterfactual explanations for a BERT text classifier on two datasets using three evaluation metrics. The results of our experiments suggest that established white-box substitution-based methods are effective at generating valid counterfactuals that change the classifier's output. In contrast, newer methods based on large language models (LLMs) excel at producing natural and linguistically plausible text counterfactuals but often fail to generate valid counterfactuals that alter the classifier's output. Based on these results, we recommend developing new counterfactual explanation methods that combine the strengths of established gradient-based approaches and newer LLM-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Dropout · Dense Connections · Layer Normalization · Residual Connection · Counterfactuals Explanations · Linear Warmup With Linear Decay · WordPiece · Adam