The Imperfective Paradox in Large Language Models
Bolei Ma, Yusuke Miyao

TL;DR
This paper investigates whether large language models truly understand event semantics, revealing they rely on heuristics and exhibit biases, especially in aspectual inference, rather than genuine logical reasoning.
Contribution
The study introduces ImperfectiveNLI, a diagnostic dataset, and uncovers systematic biases in LLMs' handling of aspectual semantics, highlighting limitations in their reasoning capabilities.
Findings
Models show a Teleological Bias, hallucinating goal completion.
Prompting reduces bias but causes calibration issues.
Embeddings distinguish forms, but inference relies on priors.
Abstract
Do Large Language Models (LLMs) genuinely grasp the compositional semantics of events, or do they rely on surface-level probabilistic heuristics? We investigate the Imperfective Paradox, a logical phenomenon where the past progressive aspect entails event realization for activities (e.g., running ran) but not for accomplishments (e.g., building built). We introduce ImperfectiveNLI, a diagnostic dataset designed to probe this distinction across diverse semantic classes. Evaluating state-of-the-art open-weight models, we uncover a pervasive Teleological Bias: models systematically hallucinate completion for goal-oriented events, even overriding explicit textual cancellation. Prompting interventions partially reduce this bias but trigger a calibration crisis, causing models to incorrectly reject valid entailments for atelic verbs. Representational analyses further show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
