Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models

Abha Jha; Akanksha Mahajan; Ashwath Vaithinathan Aravindan; Praveen Saravanan; Sai Sailaja Policharla; Sonal Chaturbhuj Gehlot

arXiv:2601.20126·cs.CL·January 29, 2026

Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models

Abha Jha, Akanksha Mahajan, Ashwath Vaithinathan Aravindan, Praveen Saravanan, Sai Sailaja Policharla, Sonal Chaturbhuj Gehlot

PDF

Open Access

TL;DR

This paper explores a reinforcement learning approach that explicitly rewards models for abstaining from answering to reduce hallucinations, demonstrating its effectiveness across multiple benchmarks and model sizes.

Contribution

It introduces a verifiable reward framework for training LLMs to abstain, improving factual accuracy and reliability in open-ended and multiple-choice tasks.

Findings

01

Moderate abstention rewards reduce incorrect answers.

02

Larger models are more robust to abstention incentives.

03

Supervised abstention training helps mitigate exploration limitations.

Abstract

Large Language Models (LLMs) often produce hallucinated or unverifiable content, undermining their reliability in factual domains. This work investigates Reinforcement Learning with Verifiable Rewards (RLVR) as a training paradigm that explicitly rewards abstention ("I don't know") alongside correctness to promote intellectual humility. We fine-tune and evaluate Granite-3.3-2B-Instruct and Qwen-3-4B-Instruct on the MedMCQA and Hendrycks Math benchmarks using a ternary reward structure ( $- 1$ , r_abs, 1) under varying abstention reward structures. We further study the effect of combining RLVR with supervised fine-tuning strategies that teach abstention prior to reinforcement learning. Our results show that moderate abstention rewards (r_abs $\approx - 0.25$ to 0.3) consistently reduce incorrect responses without severe accuracy degradation on multiple-choice tasks, with larger models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning