Pessimistic Verification for Open Ended Math Questions

Yanxing Huang; Zihan Tang; Zejin Lin; Peng Li; Yang Liu

arXiv:2511.21522·cs.AI·November 27, 2025

Pessimistic Verification for Open Ended Math Questions

Yanxing Huang, Zihan Tang, Zejin Lin, Peng Li, Yang Liu

PDF

Open Access

TL;DR

This paper introduces pessimistic verification, a simple yet effective method that constructs multiple parallel checks to improve the accuracy of open-ended math question verification, outperforming existing techniques without high computational costs.

Contribution

It proposes a novel pessimistic verification approach that enhances math verification performance by using parallel checks, addressing false negatives and dataset annotation errors.

Findings

01

Significantly improves verification accuracy on math benchmarks

02

Outperforms extended long-CoT in test-time efficiency

03

Reduces false negatives caused by dataset annotation errors

Abstract

The key limitation of the verification performance lies in the ability of error detection. With this intuition we designed several variants of pessimistic verification, which are simple workflows that could significantly improve the verification of open-ended math questions. In pessimistic verification we construct multiple parallel verifications for the same proof, and the proof is deemed incorrect if any one of them reports an error. This simple technique significantly improves the performance across many math verification benchmarks without incurring substantial computational resources. Its token efficiency even surpassed extended long-CoT in test-time scaling. Our case studies further indicate that the majority of false negatives in stronger models are actually caused by annotation errors in the original dataset, so our method's performance is in fact underestimated.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Model Reduction and Neural Networks · Topic Modeling