PairJudge RM: Perform Best-of-N Sampling with Knockout Tournament
Yantao Liu, Zijun Yao, Rui Min, Yixin Cao, Lei Hou, Juanzi Li

TL;DR
This paper introduces PariJudge RM, a pairwise judgment model for best-of-N sampling in LLMs, which improves solution selection accuracy through knockout tournaments and pairwise comparisons, leading to significant performance gains.
Contribution
The paper proposes a novel pairwise judgment reward model and knockout tournament method for better candidate solution evaluation in LLMs, supported by a large-scale judgment dataset and fine-tuning approach.
Findings
Achieves 40-60% relative improvement on challenging problems.
Outperforms baseline reward models in accuracy.
Effective in large-scale mathematical problem solving.
Abstract
Best-of-N (BoN) sampling, a common strategy for test-time scaling of Large Language Models (LLMs), relies on reward models to select the best candidate solution from multiple generations. However, traditional reward models often assign arbitrary and inconsistent scores, limiting their effectiveness. To address this, we propose a Pairwise Judge Reward Model (PariJudge RM) combined with a knockout tournament for BoN sampling. Instead of assigning absolute scores, given one math problem, PariJudge RM judges two candidate solutions' correctness with chain-of-thought reasoning simultaneously. This approach eliminates the need for scoring and enables cross-validation of solutions through parallel judgment. In the knockout tournament, PariJudge RM conducts pairwise Judgment between candidate solutions and eliminates the incorrect ones iteratively. We construct PairJudge-432K, a large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Image and Signal Denoising Methods · Blind Source Separation Techniques
