LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study
Van Bach Nguyen, Paul Youssef, Christin Seifert, J\"org Schl\"otterer

TL;DR
This study evaluates how well Large Language Models generate and assess counterfactual explanations in NLP, revealing their strengths in fluency but limitations in minimality and label-flipping, with implications for data augmentation and model interpretability.
Contribution
It provides a comprehensive comparison of LLMs in generating and evaluating counterfactuals for NLP tasks, highlighting their capabilities and limitations.
Findings
LLMs generate fluent counterfactuals but struggle with minimal changes.
Generating counterfactuals for Sentiment Analysis is easier than for NLI.
LLMs show bias towards original labels, affecting evaluation and augmentation.
Abstract
As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for two NLU tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStatistical and Computational Modeling
MethodsCounterfactuals Explanations · FLIP
