Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models
Hyunjong Ok, Jaeho Lee

TL;DR
This paper investigates how prompt structure affects language model performance, revealing that causal attention mechanisms cause information bottlenecks that impact question-answering accuracy.
Contribution
It uncovers the causal attention limitations in language models, explaining why prompt order significantly influences performance in multiple-choice tasks.
Findings
CQO prompt order outperforms QOC by over 14 percentage points.
Causal attention masks prevent options from attending to context in QOC prompts.
The identified mechanism explains the sensitivity of models to prompt structure.
Abstract
Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth investigation on a striking case: in multiple-choice question answering, placing context before the questions and options (CQO) outperforms the reverse order (QOC) by over 14%p, consistently over a wide range of models and datasets. Through systematic architectural analysis, we identify causal attention as the core mechanism: in QOC prompts, the causal mask prevents option tokens from attending to context, creating an information bottleneck where context becomes invisible to options.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
