The Imperfective Paradox in Large Language Models

Bolei Ma; Yusuke Miyao

arXiv:2601.09373·cs.CL·April 23, 2026

The Imperfective Paradox in Large Language Models

Bolei Ma, Yusuke Miyao

PDF

TL;DR

This paper investigates whether large language models truly understand event semantics, revealing they rely on heuristics and exhibit biases, especially in aspectual inference, rather than genuine logical reasoning.

Contribution

The study introduces ImperfectiveNLI, a diagnostic dataset, and uncovers systematic biases in LLMs' handling of aspectual semantics, highlighting limitations in their reasoning capabilities.

Findings

01

Models show a Teleological Bias, hallucinating goal completion.

02

Prompting reduces bias but causes calibration issues.

03

Embeddings distinguish forms, but inference relies on priors.

Abstract

Do Large Language Models (LLMs) genuinely grasp the compositional semantics of events, or do they rely on surface-level probabilistic heuristics? We investigate the Imperfective Paradox, a logical phenomenon where the past progressive aspect entails event realization for activities (e.g., running $\to$ ran) but not for accomplishments (e.g., building $↛$ built). We introduce ImperfectiveNLI, a diagnostic dataset designed to probe this distinction across diverse semantic classes. Evaluating state-of-the-art open-weight models, we uncover a pervasive Teleological Bias: models systematically hallucinate completion for goal-oriented events, even overriding explicit textual cancellation. Prompting interventions partially reduce this bias but trigger a calibration crisis, causing models to incorrectly reject valid entailments for atelic verbs. Representational analyses further show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.