A Comparative Analysis of Counterfactual Explanation Methods for Text Classifiers
Stephen McAleese, Mark Keane

TL;DR
This paper compares five counterfactual explanation methods for text classifiers, highlighting the strengths and weaknesses of white-box and LLM-based approaches, and suggests combining their advantages for better explanations.
Contribution
It provides a comprehensive evaluation of existing counterfactual explanation methods for text classifiers and offers insights into their effectiveness and limitations.
Findings
White-box methods effectively generate valid counterfactuals.
LLM-based methods produce more natural and plausible counterfactuals.
Combining methods could improve validity and naturalness of explanations.
Abstract
Counterfactual explanations can be used to interpret and debug text classifiers by producing minimally altered text inputs that change a classifier's output. In this work, we evaluate five methods for generating counterfactual explanations for a BERT text classifier on two datasets using three evaluation metrics. The results of our experiments suggest that established white-box substitution-based methods are effective at generating valid counterfactuals that change the classifier's output. In contrast, newer methods based on large language models (LLMs) excel at producing natural and linguistically plausible text counterfactuals but often fail to generate valid counterfactuals that alter the classifier's output. Based on these results, we recommend developing new counterfactual explanation methods that combine the strengths of established gradient-based approaches and newer LLM-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Dropout · Dense Connections · Layer Normalization · Residual Connection · Counterfactuals Explanations · Linear Warmup With Linear Decay · WordPiece · Adam
