One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness

Erfan Baghaei Potraghloo; Seyedarmin Azizi; Souvik Kundu; Massoud Pedram

arXiv:2604.13006·cs.CL·April 28, 2026

One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness

Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram

PDF

TL;DR

Instruction-tuned large language models become significantly less helpful when trivial lexical constraints are applied, revealing a fragility in their response quality and underlying representations.

Contribution

This paper uncovers the vulnerability of instruction-tuned LLMs to simple lexical constraints, demonstrating a planning failure and the coupling of task competence to surface-form templates.

Findings

01

Simple lexical constraints cause 14-48% loss in response comprehensiveness.

02

Human and automated evaluations confirm genuine content loss under constraints.

03

Response length can be recovered through two-pass generation and predictive probes.

Abstract

Instruction-tuned large language models produce helpful, structured responses, but how robust is this helpfulness under trivial constraints? We show that simple lexical constraints (banning a single punctuation character or common word) cause instruction-tuned LLMs to collapse their responses, losing 14--48\% of comprehensiveness across seven models spanning five families (7B--70B, open- and closed-weight). A blinded human evaluation with 10 STEM-trained evaluators confirms genuine content loss, with information criteria degrading $1.5$ -- $2.3 \times$ more than surface criteria, a finding corroborated by over 4,100 automated pairwise comparisons (77--100\% baseline preference) across three LLM judges from two model families. Diagnostic analysis identifies this as a \emph{planning failure}: two-pass generation recovers 59--96\% of response length, and linear probes on prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.