
TL;DR
This paper introduces the concept of foundation priors, a way to incorporate model-generated outputs as subjective priors in empirical research, emphasizing their dependence on user expectations and trust, and providing a framework for their use in statistical workflows.
Contribution
It formalizes the foundation prior as a generalized Bayesian update, enabling principled integration of synthetic data into empirical analysis while accounting for subjectivity and trust.
Findings
Foundation priors depend on user trust and expectations.
Synthetic data can be integrated into statistical workflows.
Framework helps avoid conflating synthetic outputs with real data.
Abstract
Foundation models, and in particular large language models, can generate highly informative responses, prompting growing interest in using these ''synthetic'' outputs as data in empirical research and decision-making. This paper introduces the idea of a foundation prior, which shows that model-generated outputs are not as real observations, but draws from the foundation prior induced prior predictive distribution. As such synthetic data reflects both the model's learned patterns and the user's subjective priors, expectations, and biases. We model the subjectivity of the generative process by making explicit the dependence of synthetic outputs on the user's anticipated data distribution, the prompt-engineering process, and the trust placed in the foundation model. We derive the foundation prior as an exponential-tilted, generalized Bayesian update of the user's primitive prior, where a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Explainable Artificial Intelligence (XAI) · Forecasting Techniques and Applications
