Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets

Wei Liu; Zhongyu Niu; Lang Gao; Zhiying Deng; Jun Wang; Haozhao Wang; Ruixuan Li

arXiv:2505.02118·cs.AI·August 7, 2025

Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets

Wei Liu, Zhongyu Niu, Lang Gao, Zhiying Deng, Jun Wang, Haozhao Wang, Ruixuan Li

PDF

Open Access 1 Repo

TL;DR

This paper reveals that cooperative rationalization models can unintentionally learn spurious correlations, leading to biased explanations, and proposes methods to detect and prevent such biases, improving model reliability.

Contribution

It uncovers the bias risk in cooperative rationalization frameworks and introduces techniques to mitigate spurious correlations, enhancing explanation fidelity.

Findings

01

The bias can cause incorrect rationale-label correlations.

02

The proposed attack-based inspection detects these biases.

03

Our method improves rationalization accuracy across multiple datasets.

Abstract

This study investigates the self-rationalization framework constructed with a cooperative game, where a generator initially extracts the most informative segment from raw input, and a subsequent predictor utilizes the selected subset for its input. The generator and predictor are trained collaboratively to maximize prediction accuracy. In this paper, we first uncover a potential caveat: such a cooperative game could unintentionally introduce a sampling bias during rationale extraction. Specifically, the generator might inadvertently create an incorrect correlation between the selected rationale candidate and the label, even when they are semantically unrelated in the original dataset. Subsequently, we elucidate the origins of this bias using both detailed theoretical analysis and empirical evidence. Our findings suggest a direction for inspecting these correlations through attacks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jugechengzi/rationalization-a2i
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Attention Dropout · Softmax · Residual Connection · WordPiece · Linear Layer