The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts

Sangmitra Madhusudan; Kaige Chen; and Ali Emami

arXiv:2510.20543·cs.CL·January 21, 2026

The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts

Sangmitra Madhusudan, Kaige Chen, and Ali Emami

PDF

Open Access 1 Video

TL;DR

This paper introduces CenterBench, a dataset designed to evaluate whether language models understand syntax or rely on semantic shortcuts, revealing that models increasingly abandon structural analysis in favor of pattern matching as sentence complexity grows.

Contribution

The paper presents CenterBench, a novel dataset and framework to measure when language models switch from structural understanding to semantic shortcuts during comprehension tasks.

Findings

01

Models show increasing reliance on semantic shortcuts with complexity.

02

Performance gaps between plausible and implausible sentences grow with nesting depth.

03

Semantic plausibility negatively impacts model accuracy on causal reasoning questions.

Abstract

When language models correctly parse "The cat that the dog chased meowed," are they analyzing syntax or simply familiar with dogs chasing cats? Despite extensive benchmarking, we lack methods to distinguish structural understanding from semantic pattern matching. We introduce CenterBench, a dataset of 9,720 comprehension questions on center-embedded sentences (like "The cat [that the dog chased] meowed") where relative clauses nest recursively, creating processing demands from simple to deeply nested structures. Each sentence has a syntactically identical but semantically implausible counterpart (e.g., mailmen prescribe medicine, doctors deliver mail) and six comprehension questions testing surface understanding, syntactic dependencies, and causal reasoning. Testing six models reveals that performance gaps between plausible and implausible sentences widen systematically with complexity,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts· underline

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Text Readability and Simplification