Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models
Junyi Li, Hwee Tou Ng

TL;DR
This paper identifies that reinforcement learning fine-tuning of large language models increases hallucinations and proposes FSPO, a factuality-aware method that reduces hallucinations and improves reasoning accuracy.
Contribution
The paper introduces FSPO, a novel RL fine-tuning algorithm that incorporates factuality verification at each reasoning step to mitigate hallucinations in large language models.
Findings
FSPO reduces hallucinations effectively.
FSPO improves reasoning accuracy.
Models fine-tuned with FSPO outperform baselines.
Abstract
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization, achieving impressive capabilities across various challenging benchmarks. However, our empirical analysis reveals a critical drawback: reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations. We theoretically analyze the RL training dynamics, identifying high-variance gradient, entropy-induced randomness, and susceptibility to spurious local optima as key factors leading to hallucinations. To address this drawback, we propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification at each reasoning step. FSPO leverages automated verification against given evidence to dynamically adjust token-level advantage values, incentivizing factual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputability, Logic, AI Algorithms · Blockchain Technology Applications and Security · Mental Health and Psychiatry
MethodsLLaMA
