The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts
Sangmitra Madhusudan, Kaige Chen, and Ali Emami

TL;DR
This paper introduces CenterBench, a dataset designed to evaluate whether language models understand syntax or rely on semantic shortcuts, revealing that models increasingly abandon structural analysis in favor of pattern matching as sentence complexity grows.
Contribution
The paper presents CenterBench, a novel dataset and framework to measure when language models switch from structural understanding to semantic shortcuts during comprehension tasks.
Findings
Models show increasing reliance on semantic shortcuts with complexity.
Performance gaps between plausible and implausible sentences grow with nesting depth.
Semantic plausibility negatively impacts model accuracy on causal reasoning questions.
Abstract
When language models correctly parse "The cat that the dog chased meowed," are they analyzing syntax or simply familiar with dogs chasing cats? Despite extensive benchmarking, we lack methods to distinguish structural understanding from semantic pattern matching. We introduce CenterBench, a dataset of 9,720 comprehension questions on center-embedded sentences (like "The cat [that the dog chased] meowed") where relative clauses nest recursively, creating processing demands from simple to deeply nested structures. Each sentence has a syntactically identical but semantically implausible counterpart (e.g., mailmen prescribe medicine, doctors deliver mail) and six comprehension questions testing surface understanding, syntactic dependencies, and causal reasoning. Testing six models reveals that performance gaps between plausible and implausible sentences widen systematically with complexity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Text Readability and Simplification
