Many-Tier Instruction Hierarchy in LLM Agents
Jingyu Zhang, Tianjian Li, William Jurayj, Hongyuan Zhan, Benjamin Van Durme, Daniel Khashabi

TL;DR
This paper introduces ManyIH, a new paradigm and benchmark for resolving instruction conflicts across many privilege levels in large language model agents, highlighting current models' limitations.
Contribution
The paper proposes ManyIH, a scalable instruction hierarchy framework, and introduces ManyIH-Bench, a benchmark with realistic, multi-level conflicting instructions for LLM agents.
Findings
Current models perform poorly (~40% accuracy) on ManyIH-Bench.
ManyIH-Bench includes 853 tasks with up to 12 conflicting instruction levels.
The benchmark spans 46 real-world agent scenarios.
Abstract
Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, other agents, and more-each carrying different levels of trust and authority. When these instructions conflict, agents must reliably follow the highest-privilege instruction to remain safe and effective. The dominant paradigm, instruction hierarchy (IH), assumes a fixed, small set of privilege levels (typically fewer than five) defined by rigid role labels (e.g., system > user). This is inadequate for real-world agentic settings, where conflicts can arise across far more sources and contexts. In this work, we propose Many-Tier Instruction Hierarchy (ManyIH), a paradigm for resolving instruction conflicts among instructions with arbitrarily many privilege levels. We introduce ManyIH-Bench, the first benchmark for ManyIH. ManyIH-Bench requires models to navigate up to 12 levels…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
