On Learning Verifiers and Implications to Chain-of-Thought Reasoning
Maria-Florina Balcan, Avrim Blum, Zhiyuan Li, Dravyansh Sharma

TL;DR
This paper explores learning reliable verifiers for natural language Chain-of-Thought reasoning to improve the accuracy of complex problem-solving, providing a formal PAC-learning framework and analyzing verification goals.
Contribution
It introduces a formal PAC-learning framework for verifiers in Chain-of-Thought reasoning and analyzes their learnability and limitations.
Findings
Sample complexity upper bounds for learning verifiers.
Lower bounds and impossibility results for certain verification goals.
Analysis of verification goals at different strength levels.
Abstract
Chain-of-Thought reasoning has emerged as a powerful approach for solving complex mathematical and logical problems. However, it can often veer off track through incorrect or unsubstantiated inferences. Formal mathematical reasoning, which can be checked with a formal verifier, is one approach to addressing this issue. However, currently LLMs are simply not good enough to solve complex problems in a formal way, and even just formalizing an informal problem statement can be challenging. Motivated by this fact, in this work we consider the problem of learning reliable verifiers for natural language Chain-of-Thought reasoning. That is, given a problem statement and step-by-step solution in natural language, the aim of the verifier is to output [Yes] if the reasoning steps in the solution are all valid, and [No] otherwise. In this work we give a formal PAC-learning framework for studying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLogic, Reasoning, and Knowledge · Logic, programming, and type systems · Machine Learning and Algorithms
