Fast Symbolic Regression Benchmarking
Viktor Martinek

TL;DR
This paper introduces improved benchmarking methods for symbolic regression that use curated acceptable expressions and early stopping, leading to higher rediscovery rates and reduced computational costs for SR algorithms.
Contribution
It proposes new benchmarking techniques with curated expression lists and early termination, enhancing evaluation accuracy and efficiency for symbolic regression algorithms.
Findings
SymbolicRegression.jl rediscovery rate increased from 26.7% to 44.7%.
Benchmarking reduces computational expense by 41.2%.
TiSR achieves a 69.4% rediscovery rate with 63% less time.
Abstract
Symbolic regression (SR) uncovers mathematical models from data. Several benchmarks have been proposed to compare the performance of SR algorithms. However, existing ground-truth rediscovery benchmarks overemphasize the recovery of "the one" expression form or rely solely on computer algebra systems (such as SymPy) to assess success. Furthermore, existing benchmarks continue the expression search even after its discovery. We improve upon these issues by introducing curated lists of acceptable expressions, and a callback mechanism for early termination. As a starting point, we use the symbolic regression for scientific discovery (SRSD) benchmark problems proposed by Yoshitomo et al., and benchmark the two SR packages SymbolicRegression.jl and TiSR. The new benchmarking method increases the rediscovery rate of SymbolicRegression.jl from 26.7%, as reported by Yoshitomo et at., to 44.7%.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
