Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries
Luoxin Chen, Yichi Zhou, Huishuai Zhang

TL;DR
This paper introduces PRoSFI, a reward method that improves the reliability of multi-step reasoning in language models by verifying structured intermediate steps with formal proof tools.
Contribution
It proposes a novel reward framework that guides language models to generate verifiable reasoning steps, enhancing trustworthiness without needing formal proof generation.
Findings
PRoSFI improves reasoning reliability in 7B models.
Formal verification of intermediate steps increases answer credibility.
The method maintains accuracy while enhancing trustworthiness.
Abstract
Large language models (LLMs) have recently demonstrated impressive performance on complex, multi-step reasoning tasks, especially when post-trained with outcome-rewarded reinforcement learning Guo et al. 2025. However, it has been observed that outcome rewards often overlook flawed intermediate steps, leading to unreliable reasoning steps even when final answers are correct. To address this unreliable reasoning, we propose PRoSFI (Process Reward over Structured Formal Intermediates), a novel reward method that enhances reasoning reliability without compromising accuracy. Instead of generating formal proofs directly, which is rarely accomplishable for a modest-sized (7B) model, the model outputs structured intermediate steps aligned with its natural language reasoning. Each step is then verified by a formal prover. Only fully validated reasoning chains receive high rewards. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
