The Inefficiency of Genetic Programming for Symbolic Regression
Gabriel Kronberger, Fabricio Olivetti de Franca, Harry Desmond, Deaglan J. Bartlett, Lukas Kammerer

TL;DR
This paper investigates the limitations of genetic programming in symbolic regression by exhaustively analyzing search behavior and comparing it to random search, revealing inefficiencies in exploring the solution space.
Contribution
It introduces improved algorithms for equality saturation to efficiently enumerate semantically unique expressions, enabling a detailed analysis of genetic programming's search behavior.
Findings
Genetic programming explores only a small fraction of unique expressions.
It repeatedly evaluates expressions congruent to already visited ones.
The analysis is based on real-world datasets like Nikuradse and galaxy dynamics.
Abstract
We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings, allowing exhaustive enumeration of all solutions. This enables us to quantify the success probability of finding the best possible expressions, and to compare the search efficiency of genetic programming to random search in the space of semantically unique expressions. This analysis is made possible by improved algorithms for equality saturation, which we use to improve the Exhaustive Symbolic Regression algorithm; this produces the set of semantically unique expression structures, orders of magnitude smaller than the full symbolic regression search space. We compare the efficiency of random search in the set of unique expressions and genetic programming. For our experiments we use two real-world datasets where symbolic regression has been used to produce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
