Humans and LLMs Diverge on Probabilistic Inferences
Gaurav Kamath, Sreenath Madathil, Sebastian Schuster, Marie-Catherine de Marneffe, Siva Reddy

TL;DR
This paper introduces ProbCOPA, a dataset of probabilistic inferences, and shows that current LLMs fail to replicate human-like probabilistic reasoning, highlighting differences in reasoning patterns.
Contribution
The paper presents ProbCOPA, a new dataset for probabilistic inference, and provides an analysis comparing human judgments with LLM responses, revealing significant gaps in reasoning capabilities.
Findings
Humans show graded, varied probabilistic judgments.
LLMs fail to produce human-like probabilistic distributions.
A common reasoning pattern is identified in LLMs.
Abstract
Human reasoning often involves working over limited information to arrive at probabilistic conclusions. In its simplest form, this involves making an inference that is not strictly entailed by a premise, but rather only likely given the premise. While reasoning LLMs have demonstrated strong performance on logical and mathematical tasks, their behavior on such open-ended, non-deterministic inferences remains largely unexplored. We introduce ProbCOPA, a dataset of 210 handcrafted probabilistic inferences in English, each annotated for inference likelihood by 25--30 human participants. We find that human responses are graded and varied, revealing probabilistic judgments of the inferences in our dataset. Comparing these judgments with responses from eight state-of-the-art reasoning LLMs, we show that models consistently fail to produce human-like distributions. Finally, analyzing LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Topic Modeling · Decision-Making and Behavioral Economics
