Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation

Mohamad Amin Mohamadi; Tianhao Wang; Zhiyuan Li

arXiv:2511.11500·cs.LG·November 25, 2025

Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation

Mohamad Amin Mohamadi, Tianhao Wang, Zhiyuan Li

PDF

Open Access

TL;DR

This paper introduces Reinforced Hesitation, a training method for language models that encourages them to abstain when uncertain, improving trustworthiness by reducing hallucinations and enabling better risk management.

Contribution

It proposes Reinforced Hesitation with ternary rewards for training models to abstain, and introduces inference strategies that leverage abstention for safer, more trustworthy responses.

Findings

01

Models trained with RH can effectively balance accuracy and abstention.

02

Abstention strategies outperform majority voting in reducing errors.

03

Reinforced Hesitation creates models that are more honest about their limitations.

Abstract

Modern language models fail a fundamental requirement of trustworthy intelligence: knowing when not to answer. Despite achieving impressive accuracy on benchmarks, these models produce confident hallucinations, even when wrong answers carry catastrophic consequences. Our evaluations on GSM8K, MedQA and GPQA show frontier models almost never abstain despite explicit warnings of severe penalties, suggesting that prompts cannot override training that rewards any answer over no answer. As a remedy, we propose Reinforced Hesitation (RH): a modification to Reinforcement Learning from Verifiable Rewards (RLVR) to use ternary rewards (+1 correct, 0 abstention, - $λ$ error) instead of binary. Controlled experiments on logic puzzles reveal that varying $λ$ produces distinct models along a Pareto frontier, where each training penalty yields the optimal model for its corresponding risk…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Topic Modeling