Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Hyunjong Ok; Jaeho Lee

arXiv:2601.14152·cs.CL·April 22, 2026

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Hyunjong Ok, Jaeho Lee

PDF

TL;DR

This paper investigates how prompt structure affects language model performance, revealing that causal attention mechanisms cause information bottlenecks that impact question-answering accuracy.

Contribution

It uncovers the causal attention limitations in language models, explaining why prompt order significantly influences performance in multiple-choice tasks.

Findings

01

CQO prompt order outperforms QOC by over 14 percentage points.

02

Causal attention masks prevent options from attending to context in QOC prompts.

03

The identified mechanism explains the sensitivity of models to prompt structure.

Abstract

Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth investigation on a striking case: in multiple-choice question answering, placing context before the questions and options (CQO) outperforms the reverse order (QOC) by over 14%p, consistently over a wide range of models and datasets. Through systematic architectural analysis, we identify causal attention as the core mechanism: in QOC prompts, the causal mask prevents option tokens from attending to context, creating an information bottleneck where context becomes invisible to options.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.