Trustworthy Reasoning: Evaluating and Enhancing Factual Accuracy in LLM Intermediate Thought Processes
Rui Jiao, Yue Zhang, Jinku Li

TL;DR
This paper introduces a comprehensive framework to improve factual accuracy in LLM intermediate reasoning steps, combining fact-checking, reinforcement learning, and interpretability to address vulnerabilities in high-stakes applications.
Contribution
It presents a novel integrated approach that enhances factual robustness in LLM reasoning, including a fact-checker, a multi-objective reinforcement learning method, and interpretability tools.
Findings
Leading models show only ~82% factual accuracy in reasoning.
Our method improves factual robustness by up to 49.90%.
Enhanced models perform well on Math-500, AIME-2024, and GPQA benchmarks.
Abstract
We present a novel framework addressing a critical vulnerability in Large Language Models (LLMs): the prevalence of factual inaccuracies within intermediate reasoning steps despite correct final answers. This phenomenon poses substantial risks in high-stakes domains including healthcare, legal analysis, and scientific research, where erroneous yet confidently presented reasoning can mislead users into dangerous decisions. Our framework integrates three core components: (1) a specialized fact-checking classifier trained on counterfactually augmented data to detect subtle factual inconsistencies within reasoning chains; (2) an enhanced Group Relative Policy Optimization (GRPO) reinforcement learning approach that balances factuality, coherence, and structural correctness through multi-dimensional rewards; and (3) a mechanistic interpretability method examining how factuality improvements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Topic Modeling
