Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events

James A. Michaelov; Reeka Estacio; Zhien Zhang; Benjamin K. Bergen

arXiv:2506.06808·cs.CL·June 10, 2025

Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events

James A. Michaelov, Reeka Estacio, Zhien Zhang, Benjamin K. Bergen

PDF

Open Access

TL;DR

This paper demonstrates that current language models struggle to reliably distinguish between possible and impossible events, often performing worse than chance in predicting event likelihoods under certain conditions.

Contribution

The study critically evaluates language models' ability to differentiate possible from improbable events, revealing significant limitations and inconsistent performance.

Findings

01

Language models often assign higher probabilities to impossible sentences than to unlikely ones.

02

Models perform at worse-than-chance levels in certain scenarios.

03

Current models lack robustness in understanding event likelihoods.

Abstract

Can language models reliably predict that possible events are more likely than merely improbable ones? By teasing apart possibility, typicality, and contextual relatedness, we show that despite the results of previous work, language models' ability to do this is far from robust. In fact, under certain conditions, all models tested - including Llama 3, Gemma 2, and Mistral NeMo - perform at worse-than-chance level, assigning higher probabilities to impossible sentences such as 'the car was given a parking ticket by the brake' than to merely unlikely sentences such as 'the car was given a parking ticket by the explorer'.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Language and cultural evolution