Express Your Doubts -- Probabilistic World Modeling Should not be Based on Token logprobs

Eitan Wagner; Omri Abend

arXiv:2505.02072·cs.CL·May 13, 2026

Express Your Doubts -- Probabilistic World Modeling Should not be Based on Token logprobs

Eitan Wagner, Omri Abend

PDF

TL;DR

This paper argues that using token log probabilities from large language models for probabilistic world modeling is flawed and advocates for explicit second-order probability predictions for better theoretical soundness.

Contribution

It highlights the theoretical and practical issues of using token logprobs for world probabilities and proposes second-order prediction as a more sound alternative.

Findings

01

Token logprobs can lead to conflicting output distributions.

02

Using output probabilities as event probabilities can be misleading.

03

Second-order prediction offers a theoretically sound approach.

Abstract

Language modeling has shifted in recent years from a distribution over strings to prediction models with textual inputs and outputs for general-purpose tasks. This position paper highlights the often overlooked implications of this shift for the use of large language models (LLMs) as probability estimators, especially for world probabilities. In light of the theoretical distinction between distribution estimation and response prediction, we examine LLM training phases and common use cases for LLM output probabilities. We show that the different settings lead to distinct, potentially conflicting, desired output distributions. This lack of clarity leads to pitfalls when using output probabilities as event probabilities. Our position advocates for second-order prediction -- incorporating probabilities explicitly as part of the output -- as a theoretically sound method, in contrast to using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.