Exchangeable Sequence Models Quantify Uncertainty Over Latent Concepts
Naimeng Ye, Hongseok Namkoong

TL;DR
This paper demonstrates that pre-trained sequence models can naturally perform probabilistic reasoning over exchangeable data, effectively quantifying uncertainty and updating beliefs as new data is observed, aligning with Bayesian principles.
Contribution
It introduces a framework connecting exchangeable sequence modeling with Bayesian inference, showing how pre-trained autoregressive models can explicitly quantify uncertainty over latent environments.
Findings
Sequence prediction loss correlates with uncertainty quality.
Exchangeability can be encoded via data augmentation, regularization, and causal masking.
Pre-trained models perform implicit Bayesian inference over data.
Abstract
Intelligent agents must be able to articulate its own uncertainty. In this work, we show that pre-trained sequence models are naturally capable of probabilistic reasoning over exchangeable data points -- forming informed beliefs and sharpening them as it gathers more information. A sequence model learns the relationship between observations, which differs from typical Bayesian models that quantify uncertainty over latent parameters through priors and likelihoods (e.g., topic models). Despite the apparent difference, we illustrate how exchangeable sequence modeling provides a valid Bayesian model by going back to De Finetti's classical predictive view of probabilistic reasoning: uncertainty comes from data that has not been observed yet, rather than latent parameters. From this perspective, pre-training autoregressive models is equivalent to formulating informed beliefs based on prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Semantic Web and Ontologies
