Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs

Chelsea Zou; Yiheng Yao; Basant Khalil

arXiv:2511.15921·cs.AI·November 21, 2025

Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs

Chelsea Zou, Yiheng Yao, Basant Khalil

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning framework that uses confidence and entropy signals to detect and reduce hallucinations in LLMs, enhancing reasoning stability and faithfulness.

Contribution

It presents a novel self-correcting RL approach that leverages fine-grained uncertainty signals to improve LLM reasoning and reduce hallucinations.

Findings

01

Improves final answer accuracy in LLMs.

02

Enhances reasoning calibration and faithfulness.

03

Validates individual contribution of uncertainty signals.

Abstract

This project develops a self correcting framework for large language models (LLMs) that detects and mitigates hallucinations during multi-step reasoning. Rather than relying solely on final answer correctness, our approach leverages fine grained uncertainty signals: 1) self-assessed confidence alignment, and 2) token-level entropy spikes to detect unreliable and unfaithful reasoning in real time. We design a composite reward function that penalizes unjustified high confidence and entropy spikes, while encouraging stable and accurate reasoning trajectories. These signals guide a reinforcement learning (RL) policy that makes the model more introspective and shapes the model's generation behavior through confidence-aware reward feedback, improving not just outcome correctness but the coherence and faithfulness of their intermediate reasoning steps. Experiments show that our method improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications