The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
Yubo Li, Lu Zhang, Tianchong Jiang, Ramayya Krishnan, Rema Padman

TL;DR
This paper investigates how large language models rely on surface heuristics over explicit constraints, revealing systematic failures and proposing a benchmark to measure and address this reasoning vulnerability.
Contribution
It introduces the Heuristic Override Benchmark (HOB) to evaluate LLMs' tendency to override constraints with surface cues and explores methods to mitigate this issue.
Findings
Models perform poorly on constraint tasks, with no model exceeding 75% accuracy.
A minimal hint improves performance by an average of 15 percentage points.
Removing constraints worsens performance in most models, indicating conservative bias.
Abstract
Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent with keyword associations than compositional inference. The Heuristic Override Benchmark (HOB) -- 500 instances spanning 4 heuristic by 5 constraint families with minimal pairs and explicitness gradients -- demonstrates generality across 14 models: under strict evaluation (10/10 correct), no model exceeds 75%, and presence constraints are hardest (44%). A minimal hint (e.g., emphasizing the key object) recovers +15 pp on average, suggesting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
