On the Overscaling Curse of Parallel Thinking: System Efficacy Contradicts Sample Efficiency

Yiming Wang; Zhuosheng Zhang; Rui Wang

arXiv:2601.21619·cs.LG·May 12, 2026

On the Overscaling Curse of Parallel Thinking: System Efficacy Contradicts Sample Efficiency

Yiming Wang, Zhuosheng Zhang, Rui Wang

PDF

TL;DR

This paper analyzes the overscaling curse in parallel reasoning systems, introduces LanBo to predict sample-specific budgets, and proposes PreAda for more efficient parallel decoding, improving hardware efficiency.

Contribution

It formally analyzes the overscaling curse, introduces LanBo for predicting optimal budgets, and develops PreAda for budget allocation before decoding to enhance efficiency.

Findings

01

LanBo improves budget utilization significantly.

02

PreAda enhances hardware efficiency in latency and memory.

03

The analysis quantifies the prevalence of the overscaling curse.

Abstract

Parallel thinking improves LLM reasoning through multi-path sampling and aggregation. In standard evaluations, due to a lack of sample-specific priors, all samples share a global budget chosen to maximize dataset accuracy. However, many samples reach their best accuracy with much smaller budgets, causing low budget utilization. This contradiction between system efficacy and sample efficiency constitutes the Overscaling Curse. In this paper, we first provide a formal analysis of the overscaling curse and quantify its prevalence and severity in real-world systems. To break it, we propose Latent Budget Predictor (LanBo), which probes model latent representations to predict sample-specific optimal budgets. LanBo significantly improves budget utilization while maintaining dataset accuracy. We further integrate LanBo into the full decoding pipeline, inspiring Pre-decoding Budget Adaptation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.