Symbolic regression outperforms other models for small data sets
Casper Wilstrup, Jaan Kasak

TL;DR
This study shows that symbolic regression outperforms traditional machine learning models like random forests and gradient boosting in small data sets, providing better generalization and interpretability.
Contribution
The paper demonstrates that symbolic regression offers superior generalization and interpretability over traditional models for small datasets of around 250 observations.
Findings
Symbolic regression outperforms other models in 132 out of 240 cases.
It achieves higher R2 scores on out-of-sample data.
It maintains interpretability similar to linear models and decision trees.
Abstract
Machine learning is often applied in health science to obtain predictions and new understandings of complex phenomena and relationships, but an availability of sufficient data for model training is a widespread problem. Traditional machine learning techniques, such as random forests and gradient boosting, tend to overfit when working with data sets of only a few hundred observations. This study demonstrates that for small training sets of 250 observations, symbolic regression generalises better to out-of-sample data than traditional machine learning frameworks, as measured by the coefficient of determination R2 on the validation set. In 132 out of 240 cases, symbolic regression achieves a higher R2 than any of the other models on the out-of-sample data. Furthermore, symbolic regression also preserves the interpretability of linear models and decision trees, an added benefit to its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Evolutionary Algorithms and Applications
