Adaptive Head Budgeting for Efficient Multi-Head Attention
Bilal Faye, Abdoulaye Mbaye, Hanane Azzag, Mustapha Lebbah

TL;DR
This paper introduces BudgetFormer, an adaptive multi-head attention Transformer that dynamically allocates attention heads based on input complexity, reducing computational costs while maintaining or improving performance.
Contribution
The paper presents a novel adaptive attention mechanism with a training strategy for dynamic head allocation, enhancing efficiency and effectiveness in Transformer models.
Findings
Reduces inference FLOPs and memory usage.
Achieves comparable or better performance than standard multi-head attention.
Effectively adapts to varying input complexities in text classification.
Abstract
Transformers have become the dominant architecture across a wide range of domains, largely due to the effectiveness of multi-head attention in capturing diverse representation subspaces. However, standard multi-head attention activates all heads uniformly for every input, regardless of task requirements or input complexity. In many scenarios, particularly for coarse-grained tasks such as text classification, the relevant information is often global and does not require the full diversity of attention heads. As a consequence, using a fixed number of heads can introduce unnecessary computational cost or lead to suboptimal performance when the allocation does not match the input. To address this limitation, we introduce BudgetFormer, a Transformer architecture equipped with an adaptive multi-head attention mechanism that dynamically allocates computational resources. Our approach learns,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
