The Atomic Instruction Gap: Instruction-Tuned LLMs Struggle with Simple, Self-Contained Directives

Henry Lim; Kwan Hui Lim

arXiv:2510.17388·cs.CL·October 21, 2025

The Atomic Instruction Gap: Instruction-Tuned LLMs Struggle with Simple, Self-Contained Directives

Henry Lim, Kwan Hui Lim

PDF

Open Access

TL;DR

This paper evaluates instruction-tuned large language models on their ability to follow simple, self-contained directives, revealing significant biases and weaknesses in their instruction adherence, especially with non-numeric labels and minimal guidance.

Contribution

The study systematically assesses 20 IT-LLMs on instruction-following robustness, exposing format biases and highlighting the need for improved training strategies targeting atomic directives.

Findings

01

Explicit instructions cause large performance shifts with label formats.

02

Models perform poorly without explicit instructions and content removal.

03

Larger models are more accurate but still inconsistent in following instructions.

Abstract

Instruction-tuned large language models (IT-LLMs) exhibit strong zero-shot reasoning, yet their ability to execute simple, self-contained instructions remains underexplored, despite this being foundational to complex instruction-following. We evaluate 20 IT-LLMs on modified MMLU and MMLU-Pro benchmarks, by systematically varying the format of option labels (alphabetic, numeric, Roman) while keeping their meaning identical under four paradigms, namely: (1) With explicit instructions, label changes cause large performance shifts (e.g., -30.45\% for Roman vs. numeric), revealing instruction-format bias. (2) Without instructions, performance drops further (up to -10.84\%) and label sensitivity intensifies, underscoring the role of explicit guidance. (3) When option contents are removed, models fail random-choice baselines except with numeric labels, suggesting weak adherence to atomic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques