When to Trust the Cheap Check: Weak and Strong Verification for Reasoning
Shayan Kiyani, Sima Noorani, George Pappas, and Hamed Hassani

TL;DR
This paper explores the balance between cheap, noisy verification methods and costly, reliable verification in large language models, proposing policies to optimize trust decisions.
Contribution
It formalizes the weak-strong verification framework, introduces metrics, characterizes optimal policies, and develops an online algorithm with provable error control.
Findings
Optimal policies have a two-threshold structure.
Calibration and sharpness influence the value of weak verifiers.
The online algorithm controls errors without assumptions.
Abstract
Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Externally, users inspect outputs and steer the model through feedback until results are trustworthy, which we call strong verification. These signals differ sharply in cost and reliability: strong verification can establish trust but is resource-intensive, while weak verification is fast and scalable but noisy and imperfect. We formalize this tension through weak--strong verification policies, which decide when to accept or reject based on weak verification and when to defer to strong verification. We introduce metrics capturing incorrect acceptance, incorrect rejection, and strong-verification frequency. Over population, we show that optimal policies admit a two-threshold structure and that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Logic, Reasoning, and Knowledge · Explainable Artificial Intelligence (XAI)
