In Context Learning and Reasoning for Symbolic Regression with Large Language Models
Samiha Sharlin, Tyler R. Josephson

TL;DR
This paper investigates how large language models like GPT-4 can be used for symbolic regression by prompting them with data and scientific context, enabling equation rediscovery and integration of domain knowledge.
Contribution
It demonstrates a novel workflow using LLMs with chain-of-thought prompting and external tools for symbolic regression, incorporating scientific context and constraints.
Findings
GPT-4 and GPT-4o successfully rediscovered known equations.
Performance improves with scratchpad reasoning and scientific context consideration.
Natural language prompts facilitate integration of theory and data in symbolic regression.
Abstract
Large Language Models (LLMs) are transformer-based machine learning models that have shown remarkable performance in tasks for which they were not explicitly trained. Here, we explore the potential of LLMs to perform symbolic regression -- a machine-learning method for finding simple and accurate equations from datasets. We prompt GPT-4 and GPT-4o models to suggest expressions from data, which are then optimized and evaluated using external Python tools. These results are fed back to the LLMs, which propose improved expressions while optimizing for complexity and loss. Using chain-of-thought prompting, we instruct the models to analyze data, prior expressions, and the scientific context (expressed in natural language) for each problem before generating new expressions. We evaluated the workflow in rediscovery of Langmuir and dual-site Langmuir's model for adsorption, along with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
