KARL: Mitigating Hallucinations in LLMs via Knowledge-Boundary-Aware Reinforcement Learning

Cheng Gao; Cheng Huang; Kangyang Luo; Ziqing Qiao; Shuzheng Si; Huimin Chen; Chaojun Xiao; Maosong Sun

arXiv:2604.22779·cs.LG·April 28, 2026

KARL: Mitigating Hallucinations in LLMs via Knowledge-Boundary-Aware Reinforcement Learning

Cheng Gao, Cheng Huang, Kangyang Luo, Ziqing Qiao, Shuzheng Si, Huimin Chen, Chaojun Xiao, Maosong Sun

PDF

TL;DR

KARL is a reinforcement learning framework that improves LLMs by aligning abstention with their knowledge boundaries, reducing hallucinations while maintaining accuracy.

Contribution

It introduces a knowledge-boundary-aware reward and a two-stage training strategy to better align LLM abstention with true knowledge limits.

Findings

01

KARL achieves a better accuracy-hallucination trade-off across benchmarks.

02

It effectively suppresses hallucinations without sacrificing answer accuracy.

03

The method performs well on both in-distribution and out-of-distribution data.

Abstract

Enabling large language models (LLMs) to appropriately abstain from answering questions beyond their knowledge is crucial for mitigating hallucinations. While existing reinforcement learning methods foster autonomous abstention, they often compromise answer accuracy because their static reward mechanisms, agnostic to models' knowledge boundaries, drive models toward excessive caution. In this work, we propose KARL, a novel framework that continuously aligns an LLM's abstention behavior with its evolving knowledge boundary. KARL introduces two core innovations: a Knowledge-Boundary-Aware Reward that performs online knowledge boundary estimation using within-group response statistics, dynamically rewarding correct answers or guided abstention; and a Two-Stage RL Training Strategy that first explores the knowledge boundary and bypasses the "abstention trap", and subsequently converts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.