Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models

Junyi Li; Hwee Tou Ng

arXiv:2505.24630·cs.CL·November 7, 2025

Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models

Junyi Li, Hwee Tou Ng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper identifies that reinforcement learning fine-tuning of large language models increases hallucinations and proposes FSPO, a factuality-aware method that reduces hallucinations and improves reasoning accuracy.

Contribution

The paper introduces FSPO, a novel RL fine-tuning algorithm that incorporates factuality verification at each reasoning step to mitigate hallucinations in large language models.

Findings

01

FSPO reduces hallucinations effectively.

02

FSPO improves reasoning accuracy.

03

Models fine-tuned with FSPO outperform baselines.

Abstract

Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization, achieving impressive capabilities across various challenging benchmarks. However, our empirical analysis reveals a critical drawback: reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations. We theoretically analyze the RL training dynamics, identifying high-variance gradient, entropy-induced randomness, and susceptibility to spurious local optima as key factors leading to hallucinations. To address this drawback, we propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification at each reasoning step. FSPO leverages automated verification against given evidence to dynamically adjust token-level advantage values, incentivizing factual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nusnlp/fspo
pytorchOfficial

Videos

Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models· slideslive

Taxonomy

TopicsComputability, Logic, AI Algorithms · Blockchain Technology Applications and Security · Mental Health and Psychiatry

MethodsLLaMA