Implicit Probabilistic Reasoning Does Not Reflect Explicit Answers in Large Language Models

Manuel Mondal; Ljiljana Dolamic; G\'er\^ome Bovet; Philippe Cudr\'e-Mauroux; Julien Audiffren

arXiv:2406.14986·cs.AI·February 12, 2026·1 cites

Implicit Probabilistic Reasoning Does Not Reflect Explicit Answers in Large Language Models

Manuel Mondal, Ljiljana Dolamic, G\'er\^ome Bovet, Philippe Cudr\'e-Mauroux, Julien Audiffren

PDF

Open Access

TL;DR

This paper compares explicit and implicit probabilistic reasoning in large language models, revealing that models perform well on explicit tasks but often diverge from true probabilities during implicit reasoning in text generation.

Contribution

The paper introduces an alternative implicit probabilistic reasoning evaluation method, highlighting discrepancies between models' explicit answers and their implicit reasoning in text generation.

Findings

01

Models perform well on explicit probabilistic reasoning (MCQs)

02

Implicit reasoning predictions often diverge from true probabilities

03

Implicit reasoning is influenced by prior events and background info

Abstract

The handling of probabilities in the form of uncertainty or partial information is an essential task for LLMs in many settings and applications. A common approach to evaluate an LLM's probabilistic reasoning capabilities is to assess its ability to answer questions pertaining to probability through the use of multiple-choice questions (MCQs). However, this paradigm, which we refer to as explicit probabilistic reasoning, has been shown in the literature to yield significant limitations (e.g., sensitivity to answer ordering). In this work, we introduce an alternative approach, named implicit probabilistic reasoning, which evaluates the models' ability to integrate probabilistic reasoning into their text generation process. To achieve this, we rephrase MCQs as text-completion scenarios with a determined set of outcomes and compare the model's next-token probability assignments to the true…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsHierarchical Information Threading