Cluster Analysis of a Symbolic Regression Search Space
Gabriel Kronberger, Lukas Kammerer, Bogdan Burlacu, Stephan M., Winkler, Michael Kommenda, Michael Affenzeller

TL;DR
This paper analyzes the distribution and clustering of symbolic regression models generated by genetic programming, revealing insights into model similarity and search space exploration.
Contribution
It introduces a method to cluster symbolic regression models based on phenotypic and genotypic similarity, enhancing understanding of GP search dynamics.
Findings
Phenotypic similarity yields clear clusters
Genotypic similarity does not produce distinct clusters
GP initially explores entire space then converges to high-quality solutions
Abstract
In this chapter we take a closer look at the distribution of symbolic regression models generated by genetic programming in the search space. The motivation for this work is to improve the search for well-fitting symbolic regression models by using information about the similarity of models that can be precomputed independently from the target function. For our analysis, we use a restricted grammar for uni-variate symbolic regression models and generate all possible models up to a fixed length limit. We identify unique models and cluster them based on phenotypic as well as genotypic similarity. We find that phenotypic similarity leads to well-defined clusters while genotypic similarity does not produce a clear clustering. By mapping solution candidates visited by GP to the enumerated search space we find that GP initially explores the whole search space and later converges to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
