Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning
Jiayun Wu, Jiashuo Liu, Zhiyuan Zeng, Tianyang Zhan, Tianle Cai, Wenhao Huang

TL;DR
This paper introduces behaviorally calibrated reinforcement learning to reduce hallucinations in large language models by encouraging uncertainty estimation and abstention, improving factual reliability without sacrificing accuracy.
Contribution
It proposes and evaluates training methods that optimize proper scoring rules, enabling models to better calibrate their confidence and abstain when uncertain.
Findings
Smaller models outperform larger ones in uncertainty calibration.
Model's accuracy-to-hallucination ratio significantly improves.
Zero-shot calibration error matches frontier models in factual QA.
Abstract
LLM deployment in critical domains is currently impeded by persistent hallucinations--generating plausible but factually incorrect assertions. While scaling laws drove significant improvements in general capabilities, theoretical frameworks suggest hallucination is not merely stochastic error but a predictable statistical consequence of training objectives prioritizing mimicking data distribution over epistemic honesty. Standard RLVR paradigms, utilizing binary reward signals, inadvertently incentivize models as good test-takers rather than honest communicators, encouraging guessing whenever correctness probability exceeds zero. This paper presents an exhaustive investigation into behavioral calibration, which incentivizes models to stochastically admit uncertainty by abstaining when not confident, aligning model behavior with accuracy. Synthesizing recent advances, we propose and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Graph Neural Networks
