CEval: A Benchmark for Evaluating Counterfactual Text Generation
Van Bach Nguyen, J\"org Schl\"otterer, Christin Seifert

TL;DR
This paper introduces CEval, a comprehensive benchmark for evaluating counterfactual text generation methods, unifying metrics, datasets, and baselines to facilitate consistent and fair comparison.
Contribution
CEval provides a standardized benchmark with datasets, metrics, baselines, and open-source tools for evaluating counterfactual text generation methods.
Findings
No method perfectly balances counterfactual accuracy and text quality.
High-performing counterfactual methods often produce lower-quality text.
Large language models with simple prompts generate high-quality but less accurate counterfactuals.
Abstract
Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, includes common counterfactual datasets with human annotations, standard baselines (MICE, GDBA, CREST) and the open-source language model LLAMA-2. Our experiments found no perfect method for generating counterfactual text. Methods that excel at counterfactual metrics often produce lower-quality text while LLMs with simple prompts generate high-quality text but struggle with counterfactual criteria. By making CEval available as an open-source Python library, we encourage the community to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Advanced Text Analysis Techniques
