Guiding Multi-Objective Genetic Programming with Description Length Improves Symbolic Regression Solutions
Gabriel Kronberger, Fabricio Olivetti de Franca, Deaglan J. Bartlett, Harry Desmond, Pedro G. Ferreira

TL;DR
This paper explores the use of description length and fractional Bayes factor criteria as principled, data-efficient methods for selecting compact, generalizable symbolic regression models in genetic programming, outperforming traditional heuristics.
Contribution
It evaluates and compares description length and fractional Bayes factor criteria against AIC and BIC for model selection in GPSR, providing practical guidance for their use.
Findings
DL/FBF post-selection improves test performance over AIC/BIC.
BIC with DL/FBF complexity penalty yields similar results to DL/FBF post-selection.
Using DL/FBF as a fitness function often causes premature convergence to simple models.
Abstract
Symbolic regression with genetic programming (GPSR) may suffer from overfitting and structural bloat, especially when noise is present. In this paper we evaluate description length (DL) and fractional Bayes factor (FBF) criteria as principled, data-efficient alternatives to heuristics for selecting compact expressions that generalise well. We implement DL using a Fisher-information-based parameter encoding and compare it to AIC and BIC across multiple datasets, including noisy synthetic benchmarks and real-world regression problems. We study three search/selection strategies: (i) multi-objective search for accuracy and program length followed by DL/FBF selection; (ii) multi-objective search using DL directly as an objective; and (iii) single-objective optimisation with DL/FBF as the fitness. Across datasets we find that DL/FBF post-selection improves test performance compared to AIC/BIC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
