Making a (Counterfactual) Difference One Rationale at a Time
Mitchell Plyler, Michael Green, Min Chi

TL;DR
This paper explores how unsupervised counterfactual data augmentation can improve rationale models in NLP by reducing reliance on spurious correlations, leading to more meaningful explanations.
Contribution
It introduces an unsupervised CDA method to enhance rationale models and provides an information-theoretic analysis of dataset properties affecting success.
Findings
CDA improves rationale quality over baselines.
CDA reduces spurious correlations in rationales.
Enhanced models better capture true signal.
Abstract
Rationales, snippets of extracted text that explain an inference, have emerged as a popular framework for interpretable natural language processing (NLP). Rationale models typically consist of two cooperating modules: a selector and a classifier with the goal of maximizing the mutual information (MMI) between the "selected" text and the document label. Despite their promises, MMI-based methods often pick up on spurious text patterns and result in models with nonsensical behaviors. In this work, we investigate whether counterfactual data augmentation (CDA), without human assistance, can improve the performance of the selector by lowering the mutual information between spurious signals and the document label. Our counterfactuals are produced in an unsupervised fashion using class-dependent generative models. From an information theoretic lens, we derive properties of the unaugmented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Sentiment Analysis and Opinion Mining
MethodsCounterfactuals Explanations
