Active Budget Allocation for Efficient Scaling Law Estimation via Surrogate-Guided Pruning

Viktoria Schram; Markus Hiller; Daniel Beck; Trevor Cohn

arXiv:2605.17234·cs.LG·May 19, 2026

Active Budget Allocation for Efficient Scaling Law Estimation via Surrogate-Guided Pruning

Viktoria Schram, Markus Hiller, Daniel Beck, Trevor Cohn

PDF

TL;DR

This paper introduces a surrogate-guided pruning method combined with Successive Halving to efficiently estimate scaling laws, significantly reducing computational costs while improving accuracy.

Contribution

It proposes a novel approach that integrates surrogate models with Successive Halving for more efficient and accurate scaling law estimation.

Findings

01

Surrogate-guided Successive Halving outperforms naive methods in learning curve prediction.

02

Achieves up to 98.7% reduction in computational costs compared to exhaustive approaches.

03

Improves the accuracy of scaling law estimation with mean relative gains up to 5.47%.

Abstract

Predicting model performance at larger scales enables the design of training strategies and architectures tailored to specific performance targets. Empirical scaling law research identifies functional forms to aid this prediction task. These describe the relationship between loss and compute using a loss-compute frontier defined by learning curves. Due to the empirical nature of this approach, the computational burden is substantial, making strategic resource allocation essential - yet it remains surprisingly underexplored. In this work, we address this shortcoming by exploring the suitability of Successive Halving (SH) and SH combined with parametric and non-parametric surrogate models. In addition to enabling a more systematic allocation of a given compute budget, our findings show that SH paired with surrogate models yields a set of learning curves that includes one with a lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.