CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning
Fangzhou Lin, Shuo Xing, Peiran Li, Siyuan Yang, Qianwen Ge, Kazunori Yamada, Ziming Zhang, Haichong Zhang, Zhengzhong Tu

TL;DR
CAPS is an inference-only framework that adaptively allocates verifier compute in parallel reasoning, significantly reducing costs while improving accuracy across multiple benchmarks.
Contribution
Introduces CAPS, a cascaded adaptive pairwise selection method that optimizes verifier resource allocation for large language model reasoning tasks.
Findings
CAPS outperforms leading pairwise verifier on 14 of 20 benchmarks.
Uses only 25.4% of verifier tokens compared to uniform schedules.
Achieves better accuracy than pointwise self-verification on all tested suites.
Abstract
Parallel reasoning, where a generator samples many candidate solutions and an aggregator selects the best, is one of the most effective forms of test-time scaling in large language models, and pairwise self-verification has become its strongest aggregation primitive. Yet pairwise verification carries a heavy cost: each judgment reads two complete solutions in full, and existing methods perform tens of such judgments per problem regardless of whether the comparison is informative. We introduce CAPS (Cascaded Adaptive Pairwise Selection), an inference-only framework that allocates verifier compute non-uniformly along two orthogonal axes: an evidence axis that adapts how much of each candidate the judge sees, and a distribution axis that adapts how comparisons are spread across the pool. CAPS instantiates these into a four-stage cascade with an optional rescue subroutine, and admits a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
