Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

Yilin Geng; Haonan Li; Honglin Mu; Xudong Han; Timothy Baldwin; Omri Abend; Eduard Hovy; Lea Frermann

arXiv:2502.15851·cs.CL·March 23, 2026

Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

Yilin Geng, Haonan Li, Honglin Mu, Xudong Han, Timothy Baldwin, Omri Abend, Eduard Hovy, Lea Frermann

PDF

1 Repo 1 Video

TL;DR

This paper systematically evaluates how well large language models enforce hierarchical instructions, revealing significant struggles and biases that challenge current control mechanisms and highlight the influence of societal hierarchies.

Contribution

It introduces a new evaluation framework for instruction hierarchy enforcement in LLMs and uncovers their limitations and biases in prioritizing instructions.

Findings

01

Models struggle with instruction prioritization.

02

System/user prompt separation is ineffective.

03

Societal hierarchy influences model behavior more than explicit instructions.

Abstract

Large language models (LLMs) are increasingly deployed with hierarchical instruction schemes, where certain instructions (e.g., system-level directives) are expected to take precedence over others (e.g., user messages). Yet, we lack a systematic understanding of how effectively these hierarchical control mechanisms work. We introduce a systematic evaluation framework based on constraint prioritization to assess how well LLMs enforce instruction hierarchies. Our experiments across six state-of-the-art LLMs reveal that models struggle with consistent instruction prioritization, even for simple formatting conflicts. We find that the widely-adopted system/user prompt separation fails to establish a reliable instruction hierarchy, and models exhibit strong inherent biases toward certain constraint types regardless of their priority designation. Interestingly, we also find that societal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yilin-geng/llm_instruction_conflicts
noneOfficial

Videos

Control Illusion: The Failure of Instruction Hierarchies in Large Language Models· underline