In-Context Symbolic Regression: Leveraging Large Language Models for Function Discovery
Matteo Merler, Katsiaryna Haitsiukevich, Nicola Dainese, Pekka, Marttinen

TL;DR
This paper introduces In-Context Symbolic Regression (ICSR), a novel framework that leverages Large Language Models to discover symbolic equations from data, outperforming traditional methods in accuracy and simplicity.
Contribution
The paper presents the first comprehensive framework using LLMs for symbolic regression, combining iterative refinement and external optimization for improved results.
Findings
LLMs can successfully find symbolic equations fitting data
ICSR outperforms traditional SR baselines on benchmark datasets
ICSR yields simpler equations with better out-of-distribution generalization
Abstract
State of the art Symbolic Regression (SR) methods currently build specialized models, while the application of Large Language Models (LLMs) remains largely unexplored. In this work, we introduce the first comprehensive framework that utilizes LLMs for the task of SR. We propose In-Context Symbolic Regression (ICSR), an SR method which iteratively refines a functional form with an LLM and determines its coefficients with an external optimizer. ICSR leverages LLMs' strong mathematical prior both to propose an initial set of possible functions given the observations and to refine them based on their errors. Our findings reveal that LLMs are able to successfully find symbolic equations that fit the given data, matching or outperforming the overall performance of the best SR baselines on four popular benchmarks, while yielding simpler equations with better out of distribution generalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
