Towards Scaling Laws for Symbolic Regression

David Otte; J\"org K.H. Franke; Arb\"er Zela; F\'abio Ferreira; Frank Hutter

arXiv:2510.26064·cs.LG·February 5, 2026

Towards Scaling Laws for Symbolic Regression

David Otte, J\"org K.H. Franke, Arb\"er Zela, F\'abio Ferreira, Frank Hutter

PDF

TL;DR

This paper investigates how the performance of deep learning-based symbolic regression scales with compute, revealing power-law relationships and optimal hyperparameters, thus providing a foundation for future model development.

Contribution

It is the first systematic study of scaling laws in symbolic regression using transformer models, identifying how compute influences performance and hyperparameter optimization.

Findings

01

Validation loss and success rate follow power-law trends with compute.

02

Optimal batch size and learning rate increase with model size.

03

A token-to-parameter ratio of approximately 15 is optimal.

Abstract

Symbolic regression (SR) aims to discover the underlying mathematical expressions that explain observed data. This holds promise for both gaining scientific insight and for producing inherently interpretable and generalizable models for tabular data. In this work we focus on the basics of SR. Deep learning-based SR has recently become competitive with genetic programming approaches, but the role of scale has remained largely unexplored. Inspired by scaling laws in language modeling, we present the first systematic investigation of scaling in SR, using a scalable end-to-end transformer pipeline and carefully generated training data. Across five different model sizes and spanning three orders of magnitude in compute, we find that both validation loss and solved rate follow clear power-law trends with compute. We further identify compute-optimal hyperparameter scaling: optimal batch size…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.