MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models

Xinming Wang; Jian Xu; Bin Yu; Sheng Lian; Hongzhu Yi; Yi Chen; Yingjian Zhu; Boran Wang; Hongming Yang; Han Hu; Xu-Yao Zhang; Cheng-Lin Liu

arXiv:2510.24794·cs.CL·January 6, 2026

MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models

Xinming Wang, Jian Xu, Bin Yu, Sheng Lian, Hongzhu Yi, Yi Chen, Yingjian Zhu, Boran Wang, Hongming Yang, Han Hu, Xu-Yao Zhang, Cheng-Lin Liu

PDF

TL;DR

MR-ALIGN is a novel framework that improves factual accuracy in large reasoning models by aligning their reasoning process through meta-reasoning and transition-aware rewards, without external verifiers.

Contribution

It introduces a meta-reasoning informed alignment method that enhances factuality by reinforcing beneficial reasoning patterns during the model's thinking process.

Findings

01

Consistently improves accuracy across multiple datasets

02

Reduces misleading reasoning in large models

03

Enhances factuality without external verification

Abstract

Large reasoning models (LRMs) show strong capabilities in complex reasoning, yet their marginal gains on evidence-dependent factual questions are limited. We find this limitation is partially attributable to a reasoning-answer hit gap, where the model identifies the correct facts during reasoning but fails to incorporate them into the final response, thereby reducing factual fidelity. To address this issue, we propose MR-ALIGN, a Meta-Reasoning informed alignment framework that enhances factuality without relying on external verifiers. MR-ALIGN quantifies state transition probabilities along the model's thinking process and constructs a transition-aware implicit reward that reinforces beneficial reasoning patterns while suppressing defective ones at the atomic thinking segments. This re-weighting reshapes token-level signals into probability-aware segment scores, encouraging coherent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.