Why Any-Order Autoregressive Models Need Two-Stream Attention: A Structural-Semantic Tradeoff
Patrick Pynadath, Ruqi Zhang

TL;DR
This paper investigates why two-stream attention is essential in any-order autoregressive models, revealing it helps balance competing semantic and structural information during sequence generation.
Contribution
The paper identifies a structural-semantic tradeoff in any-order generation and proposes Decoupled RoPE to address it, clarifying the role of two-stream attention.
Findings
Two-stream attention helps mitigate the structural-semantic tradeoff.
Decoupled RoPE improves performance at short sequence lengths.
Performance degrades as sequence length increases and tradeoff diverges.
Abstract
Any-order autoregressive models (AO-ARMs) offer a promising path toward efficient masked diffusion by enabling native key-value caching, but competitive performance has so far required two-stream attention, typically motivated as a means of decoupling token content from position. In this work, we argue that two-stream attention may be serving a more subtle role. We identify a structural-semantic tradeoff in any-order generation: the hidden representation at each step must simultaneously attend to semantically informative tokens for prediction and structurally recent tokens for summarization, objectives that compete for attention capacity in a single stream but can specialize across two streams. To isolate this tradeoff from position-content separation, we propose Decoupled RoPE, a modification to rotary position embeddings that provides target position information without revealing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare
