Active Budget Allocation for Efficient Scaling Law Estimation via Surrogate-Guided Pruning
Viktoria Schram, Markus Hiller, Daniel Beck, Trevor Cohn

TL;DR
This paper introduces a surrogate-guided pruning method combined with Successive Halving to efficiently estimate scaling laws, significantly reducing computational costs while improving accuracy.
Contribution
It proposes a novel approach that integrates surrogate models with Successive Halving for more efficient and accurate scaling law estimation.
Findings
Surrogate-guided Successive Halving outperforms naive methods in learning curve prediction.
Achieves up to 98.7% reduction in computational costs compared to exhaustive approaches.
Improves the accuracy of scaling law estimation with mean relative gains up to 5.47%.
Abstract
Predicting model performance at larger scales enables the design of training strategies and architectures tailored to specific performance targets. Empirical scaling law research identifies functional forms to aid this prediction task. These describe the relationship between loss and compute using a loss-compute frontier defined by learning curves. Due to the empirical nature of this approach, the computational burden is substantial, making strategic resource allocation essential - yet it remains surprisingly underexplored. In this work, we address this shortcoming by exploring the suitability of Successive Halving (SH) and SH combined with parametric and non-parametric surrogate models. In addition to enabling a more systematic allocation of a given compute budget, our findings show that SH paired with surrogate models yields a set of learning curves that includes one with a lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
