JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Jiangshan Duo; Hanyu Li; Hailin Zhang; Yudong Wang; Sujian Li; Liang Zhao

arXiv:2601.08468·cs.CL·January 14, 2026

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Jiangshan Duo, Hanyu Li, Hailin Zhang, Yudong Wang, Sujian Li, Liang Zhao

PDF

Open Access

TL;DR

JudgeRLVR introduces a two-stage judge-then-generate approach that improves reasoning efficiency and accuracy in large language models by learning to discriminate valid solutions before generation.

Contribution

It proposes a novel judge-then-generate paradigm that enhances reasoning efficiency and accuracy in RLVR by incorporating a discriminative judgment stage.

Findings

01

Achieves +3.7 points accuracy gain on in-domain math tasks.

02

Reduces average generation length by 42%.

03

Improves out-of-domain benchmark performance by +4.5 points.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard paradigm for reasoning in Large Language Models. However, optimizing solely for final-answer correctness often drives models into aimless, verbose exploration, where they rely on exhaustive trial-and-error tactics rather than structured planning to reach solutions. While heuristic constraints like length penalties can reduce verbosity, they often truncate essential reasoning steps, creating a difficult trade-off between efficiency and verification. In this paper, we argue that discriminative capability is a prerequisite for efficient generation: by learning to distinguish valid solutions, a model can internalize a guidance signal that prunes the search space. We propose JudgeRLVR, a two-stage judge-then-generate paradigm. In the first stage, we train the model to judge solution responses with verifiable answers.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques