Pseudo-Formalization for Automatic Proof Verification

Slim Barkallah; Luke Bailey; Kaiyue Wen; Mohammed Abouzaid; Tengyu Ma

arXiv:2605.20531·cs.LO·May 21, 2026

Pseudo-Formalization for Automatic Proof Verification

Slim Barkallah, Luke Bailey, Kaiyue Wen, Mohammed Abouzaid, Tengyu Ma

PDF

TL;DR

This paper introduces Pseudo-Formalization, a flexible proof format that combines natural language and formal verification, enabling AI systems to verify complex mathematical proofs more effectively.

Contribution

The paper proposes Pseudo-Formalization and Block Verification, a novel approach to verify natural language proofs by translating them into a modular, formal-like format.

Findings

01

PF+BV outperforms LLM-as-judge baselines in error detection

02

PF+BV achieves better precision and recall on benchmark datasets

03

Research-level proof verification benchmark ArxivMathGradingBench is released

Abstract

Reliable verification of proofs remains a bottleneck for training and evaluating AI systems on hard mathematical reasoning. Fully formal proofs, in languages like Lean, are easy to verify because they are unambiguous and modular. Most proofs, particularly those written by AI systems, have neither property, and translating them into formal languages remains challenging in many frontier math settings. We propose Pseudo-Formalization (PF), a proof format that captures the modularity and precision of formal proofs while retaining the flexibility of natural language. A Pseudo-Formal proof is decomposed into self-contained modules, each stating its premises, conclusion, and proof in natural language. To verify the correctness of a regular natural language proof, an LLM translates it to Pseudo-Formal and then verifies each module independently, an algorithm we call Block Verification (BV). We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.