Interleaved Head Attention
Sai Surya Duvvuri, Chanakya Ekbote, Rachit Bansal, Rishabh Tiwari, Devvrit Khatri, David Brandfonbrener, Paul Liang, Inderjit Dhillon, Manzil Zaheer

TL;DR
Interleaved Head Attention (IHA) enhances multi-head attention by enabling cross-head communication, improving reasoning capabilities and efficiency in large language models, with demonstrated benefits on synthetic and real-world tasks.
Contribution
The paper introduces Interleaved Head Attention, a novel mechanism allowing cross-head mixing in multi-head attention to improve reasoning and parameter efficiency.
Findings
IHA outperforms standard MHA on synthetic polynomial and order-sensitive tasks.
IHA improves retrieval accuracy on RULER by 10-20%.
IHA enhances reasoning performance on GSM8K and MATH-500 datasets.
Abstract
Multi-Head Attention (MHA) is the core computational primitive underlying modern Large Language Models (LLMs). However, MHA suffers from a fundamental linear scaling limitation: attention heads produce exactly independent attention matrices, with no communication between heads during attention computation. This becomes problematic for multi-step reasoning, where correct answers depend on aggregating evidence from multiple parts of the context and composing latent token-to-token relations over a chain of intermediate inferences. To address this, we propose Interleaved Head Attention (IHA), which enables cross-head mixing by constructing pseudo-heads per head (typically ), where each pseudo query/key/value is a learned linear combination of all original queries, keys and values respectively. Interactions between pseudo-query and pseudo-key heads induce up to …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
