Instruction Complexity Induces Positional Collapse in Adversarial LLM Evaluation

Jon-Paul Cacioli

arXiv:2604.27249·cs.CL·May 1, 2026

Instruction Complexity Induces Positional Collapse in Adversarial LLM Evaluation

Jon-Paul Cacioli

PDF

TL;DR

This study investigates how instruction complexity influences whether small instruction-tuned LLMs engage with content or rely on positional shortcuts during adversarial evaluation, revealing a spectrum of response behaviors.

Contribution

It uncovers the boundary conditions under which instruction complexity causes models to switch from content-aware to position-based shortcuts, highlighting the impact of multi-step instructions.

Findings

01

Vague instructions moderately reduce accuracy while maintaining content engagement.

02

Standard instructions induce positional entropy collapse with partial content sensitivity.

03

Multi-step instructions cause extreme positional collapse, with responses concentrated on a single position.

Abstract

When instructed to underperform on multiple-choice evaluations, do language models engage with question content or fall back on positional shortcuts? We map the boundary between these regimes using a six-condition adversarial instruction-specificity gradient administered to two instruction-tuned LLMs (Llama-3-8B and Llama-3.1-8B) on 2,000 MMLU-Pro items. Distributional screening (response-position entropy) and an independent content-engagement criterion (difficulty-accuracy correlation) jointly characterise each condition. The gradient reveals three regimes rather than a monotonic transition. Vague adversarial instructions produce moderate accuracy reduction with preserved content engagement. Standard sandbagging and capability-imitation instructions produce positional entropy collapse with partial content engagement. A two-step answer-aware avoidance instruction produces extreme…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.