Implicit Probabilistic Reasoning Does Not Reflect Explicit Answers in Large Language Models
Manuel Mondal, Ljiljana Dolamic, G\'er\^ome Bovet, Philippe Cudr\'e-Mauroux, Julien Audiffren

TL;DR
This paper compares explicit and implicit probabilistic reasoning in large language models, revealing that models perform well on explicit tasks but often diverge from true probabilities during implicit reasoning in text generation.
Contribution
The paper introduces an alternative implicit probabilistic reasoning evaluation method, highlighting discrepancies between models' explicit answers and their implicit reasoning in text generation.
Findings
Models perform well on explicit probabilistic reasoning (MCQs)
Implicit reasoning predictions often diverge from true probabilities
Implicit reasoning is influenced by prior events and background info
Abstract
The handling of probabilities in the form of uncertainty or partial information is an essential task for LLMs in many settings and applications. A common approach to evaluate an LLM's probabilistic reasoning capabilities is to assess its ability to answer questions pertaining to probability through the use of multiple-choice questions (MCQs). However, this paradigm, which we refer to as explicit probabilistic reasoning, has been shown in the literature to yield significant limitations (e.g., sensitivity to answer ordering). In this work, we introduce an alternative approach, named implicit probabilistic reasoning, which evaluates the models' ability to integrate probabilistic reasoning into their text generation process. To achieve this, we rephrase MCQs as text-completion scenarios with a determined set of outcomes and compare the model's next-token probability assignments to the true…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsHierarchical Information Threading
