SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling
Yang Xiao, Chunpu Xu, Ruifeng Yuan, Jiashuo Wang, Wenjie Li, Pengfei Liu

TL;DR
SCALE introduces a selective resource allocation framework for large language models, improving mathematical reasoning accuracy and efficiency by focusing computational effort on challenging sub-problems during inference.
Contribution
The paper proposes SCALE, a novel framework that dynamically allocates resources based on sub-problem difficulty, overcoming limitations of uniform scaling methods in LLM inference.
Findings
Achieves up to 13.75% accuracy improvement on AIME25
Reduces computational costs by 33%-53%
Outperforms uniform scaling baselines significantly
Abstract
Test-time compute scaling has emerged as a powerful paradigm for enhancing mathematical reasoning in large language models (LLMs) by allocating additional computational resources during inference. However, current methods employ uniform resource distribution across all reasoning sub-problems, creating fundamental bottlenecks where challenging sub-problems receive insufficient attention while routine operations consume disproportionate resources. This uniform allocation creates performance bottlenecks where additional computational resources yield diminishing returns. Inspired by dual-process theory, we propose \textbf{SCALE} (Selective Resource Allocation), a framework that selectively allocates computational resources based on sub-problem difficulty. SCALE operates through four stages: (1) problem decomposition into sequential reasoning sub-problems, (2) difficulty assessment of each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Natural Language Processing Techniques
