When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
Zhimin Lin, Yixin Ji, Jinpeng Li, Yu Luo, Dong Li, Junhua Fang, Juntao Li, Min Zhang

TL;DR
This paper introduces a disagreement-guided, training-free framework for dynamically selecting test-time scaling strategies in large reasoning models, improving accuracy and efficiency on mathematical benchmarks.
Contribution
It proposes a novel instance-level routing approach based on output disagreement, enabling adaptive strategy selection without additional training.
Findings
Improves accuracy by 3-7% on mathematical benchmarks.
Reduces sampling cost compared to existing test-time scaling methods.
Effectively handles diverse instance difficulties through dynamic strategy routing.
Abstract
Large Reasoning Models (LRMs) achieve strong performance on mathematical reasoning tasks but remain unreliable on challenging instances. Existing test-time scaling methods, such as repeated sampling, self-correction, and tree search, improve performance at the cost of increased computation, yet often exhibit diminishing returns on hard problems. We observe that output disagreement is strongly correlated with instance difficulty and prediction correctness, providing a useful signal for guiding instance-level strategy selection at test time. Based on this insight, we propose a training-free framework that formulates test-time scaling as an instance-level routing problem, rather than allocating more computation within a single strategy, dynamically selecting among different scaling strategies based on output disagreement. The framework applies lightweight resolution for consistent cases,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
