The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

Yubo Li; Lu Zhang; Tianchong Jiang; Ramayya Krishnan; Rema Padman

arXiv:2603.29025·cs.CL·April 23, 2026

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

Yubo Li, Lu Zhang, Tianchong Jiang, Ramayya Krishnan, Rema Padman

PDF

1 Datasets

TL;DR

This paper investigates how large language models rely on surface heuristics over explicit constraints, revealing systematic failures and proposing a benchmark to measure and address this reasoning vulnerability.

Contribution

It introduces the Heuristic Override Benchmark (HOB) to evaluate LLMs' tendency to override constraints with surface cues and explores methods to mitigate this issue.

Findings

01

Models perform poorly on constraint tasks, with no model exceeding 75% accuracy.

02

A minimal hint improves performance by an average of 15 percentage points.

03

Removing constraints worsens performance in most models, indicating conservative bias.

Abstract

Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent with keyword associations than compositional inference. The Heuristic Override Benchmark (HOB) -- 500 instances spanning 4 heuristic by 5 constraint families with minimal pairs and explicitness gradients -- demonstrates generality across 14 models: under strict evaluation (10/10 correct), no model exceeds 75%, and presence constraints are hardest (44%). A minimal hint (e.g., emphasizing the key object) recovers +15 pp on average, suggesting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

yubol/Heuristic_Override_Benchmark
dataset· 254 dl
254 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.