Foundation Priors

Sanjog Misra

arXiv:2512.01107·cs.AI·December 2, 2025

Foundation Priors

Sanjog Misra

PDF

Open Access

TL;DR

This paper introduces the concept of foundation priors, a way to incorporate model-generated outputs as subjective priors in empirical research, emphasizing their dependence on user expectations and trust, and providing a framework for their use in statistical workflows.

Contribution

It formalizes the foundation prior as a generalized Bayesian update, enabling principled integration of synthetic data into empirical analysis while accounting for subjectivity and trust.

Findings

01

Foundation priors depend on user trust and expectations.

02

Synthetic data can be integrated into statistical workflows.

03

Framework helps avoid conflating synthetic outputs with real data.

Abstract

Foundation models, and in particular large language models, can generate highly informative responses, prompting growing interest in using these ''synthetic'' outputs as data in empirical research and decision-making. This paper introduces the idea of a foundation prior, which shows that model-generated outputs are not as real observations, but draws from the foundation prior induced prior predictive distribution. As such synthetic data reflects both the model's learned patterns and the user's subjective priors, expectations, and biases. We model the subjectivity of the generative process by making explicit the dependence of synthetic outputs on the user's anticipated data distribution, the prompt-engineering process, and the trust placed in the foundation model. We derive the foundation prior as an exponential-tilted, generalized Bayesian update of the user's primitive prior, where a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Explainable Artificial Intelligence (XAI) · Forecasting Techniques and Applications