Scaling Agentic Verifier for Competitive Coding
Zeyao Ma, Jing Zhang, Xiaokang Zhang, Jiaxi Yang, Zongmeng Zhang, Jiajun Zhang, Yuheng Jing, Lei Zhang, Hao Zheng, Wenting Zhao, Junyang Lin, Binyuan Hui

TL;DR
This paper introduces Agentic Verifier, an execution-based agent that actively reasons and searches for discriminative test inputs to improve the accuracy of large language models in solving competitive programming problems.
Contribution
It proposes a novel, active input generation method using multi-turn reasoning and reinforcement learning to enhance test-time reranking of code solutions.
Findings
Achieves up to +15% accuracy gains on benchmarks.
Demonstrates effective test-time scaling behavior.
Outperforms existing execution-based reranking methods.
Abstract
Large language models (LLMs) have demonstrated strong coding capabilities but still struggle to solve competitive programming problems correctly in a single attempt. Execution-based re-ranking offers a promising test-time scaling strategy, yet existing methods are constrained by either difficult test case generation or inefficient random input sampling. To address this limitation, we propose Agentic Verifier, an execution-based agent that actively reasons about program behaviors and searches for highly discriminative test inputs that expose behavioral discrepancies among candidate solutions. Through multi-turn interaction with code execution environments, the verifier iteratively refines the candidate input generator and produces targeted counterexamples rather than blindly sampling inputs. We train the verifier to acquire this discriminative input generation capability via a scalable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Machine Learning and Algorithms · Mobile Crowdsensing and Crowdsourcing
