Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning

Qian Wang; Xuandong Zhao; Zirui Zhang; Zhanzhi Lou; Nuo Chen; Dawn Song; Bingsheng He

arXiv:2602.01528·cs.CY·April 7, 2026

Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning

Qian Wang, Xuandong Zhao, Zirui Zhang, Zhanzhi Lou, Nuo Chen, Dawn Song, Bingsheng He

PDF

1 Repo

TL;DR

This paper introduces Epistemic Independence Training (EIT), a reinforcement learning method that reduces bias influence in large language models, improving reasoning robustness and generalization across biases and benchmarks.

Contribution

EIT is a novel RL framework that makes bias cues non-predictive of reward, leading to more unbiased and transferable reasoning in LLMs.

Findings

01

EIT improves accuracy and robustness against adversarial biases.

02

Models trained with EIT generalize to unseen bias types.

03

EIT outperforms existing methods like GroupDRO and IRM across multiple benchmarks.

Abstract

Large language models (LLMs) increasingly serve as reasoners and automated evaluators, yet they remain susceptible to cognitive biases -- often altering their reasoning when faced with spurious prompt-level cues such as consensus claims or authority appeals.} Existing mitigations via prompting or supervised fine-tuning fail to generalize, as they modify surface behavior without changing the optimization objective that makes bias cues attractive. We propose \textbf{Epistemic Independence Training (EIT)}, a reinforcement learning framework grounded in a key principle: to learn independence, bias cues must be made non-predictive of reward. EIT operationalizes this through a balanced conflict strategy where bias signals are equally likely to support correct and incorrect answers, combined with a reward design that penalizes bias-following without rewarding bias agreement. Experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/bias-mitigation-with-rl-BC47
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.