Automated Statistical Model Discovery with Language Models
Michael Y. Li, Emily B. Fox, Noah D. Goodman

TL;DR
This paper presents a novel approach using large language models to automate the discovery of statistical models by iteratively proposing and critiquing models within the Box's Loop framework, eliminating the need for handcrafted search procedures.
Contribution
It introduces a language model driven method for statistical model discovery that leverages domain knowledge without requiring domain-specific languages or manual search design.
Findings
Identifies models comparable to human expert-designed models.
Extends classic models with interpretable modifications.
Effective in restricted, open-ended, and expert-guided modeling scenarios.
Abstract
Statistical model discovery is a challenging search over a vast space of models subject to domain-specific constraints. Efficiently searching over this space requires expertise in modeling and the problem domain. Motivated by the domain knowledge and programming capabilities of large language models (LMs), we introduce a method for language model driven automated statistical model discovery. We cast our automated procedure within the principled framework of Box's Loop: the LM iterates between proposing statistical models represented as probabilistic programs, acting as a modeler, and critiquing those models, acting as a domain expert. By leveraging LMs, we do not have to define a domain-specific language of models or design a handcrafted search procedure, which are key restrictions of previous systems. We evaluate our method in three settings in probabilistic modeling: searching within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications
