On Learning Verifiers and Implications to Chain-of-Thought Reasoning

Maria-Florina Balcan; Avrim Blum; Zhiyuan Li; Dravyansh Sharma

arXiv:2505.22650·cs.LG·February 16, 2026

On Learning Verifiers and Implications to Chain-of-Thought Reasoning

Maria-Florina Balcan, Avrim Blum, Zhiyuan Li, Dravyansh Sharma

PDF

Open Access 1 Video

TL;DR

This paper explores learning reliable verifiers for natural language Chain-of-Thought reasoning to improve the accuracy of complex problem-solving, providing a formal PAC-learning framework and analyzing verification goals.

Contribution

It introduces a formal PAC-learning framework for verifiers in Chain-of-Thought reasoning and analyzes their learnability and limitations.

Findings

01

Sample complexity upper bounds for learning verifiers.

02

Lower bounds and impossibility results for certain verification goals.

03

Analysis of verification goals at different strength levels.

Abstract

Chain-of-Thought reasoning has emerged as a powerful approach for solving complex mathematical and logical problems. However, it can often veer off track through incorrect or unsubstantiated inferences. Formal mathematical reasoning, which can be checked with a formal verifier, is one approach to addressing this issue. However, currently LLMs are simply not good enough to solve complex problems in a formal way, and even just formalizing an informal problem statement can be challenging. Motivated by this fact, in this work we consider the problem of learning reliable verifiers for natural language Chain-of-Thought reasoning. That is, given a problem statement and step-by-step solution in natural language, the aim of the verifier is to output [Yes] if the reasoning steps in the solution are all valid, and [No] otherwise. In this work we give a formal PAC-learning framework for studying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On Learning Verifiers and Implications to Chain-of-Thought Reasoning· slideslive

Taxonomy

TopicsLogic, Reasoning, and Knowledge · Logic, programming, and type systems · Machine Learning and Algorithms