Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics
Michael A. Riegler, Kristoffer Herland Hellton, Vajira Thambawita, Hugo L. Hammer

TL;DR
This paper explores using large language models to generate and verify informative prior distributions in Bayesian analysis, demonstrating their potential and challenges in aligning priors with data.
Contribution
It introduces a method for leveraging LLMs to suggest, verify, and reflect on priors, showing their effectiveness and limitations in real data applications.
Findings
LLMs correctly identified data associations in all cases.
Claude and Gemini provided better priors than ChatGPT.
Moderate priors often overconfident, weak priors sometimes defaulted to vague means.
Abstract
Selecting prior distributions in Bayesian statistics is challenging, resource-intensive, and subjective. We analyze using large-language models (LLMs) to suggest suitable, knowledge-based informative priors. We developed an extensive prompt asking LLMs not only to suggest priors but also to verify and reflect on their choices. We evaluated Claude Opus, Gemini 2.5 Pro, and ChatGPT-4o-mini on two real datasets: heart disease risk and concrete strength. All LLMs correctly identified the direction for all associations (e.g., that heart disease risk is higher for males). The quality of suggested priors was measured by their Kullback-Leibler divergence from the maximum likelihood estimator's distribution. The LLMs suggested both moderately and weakly informative priors. The moderate priors were often overconfident, resulting in distributions misaligned with the data. In our experiments,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
