HydraViT: Stacking Heads for a Scalable ViT
Janek Haberer, Ali Hojjat, Olaf Landsiedel

TL;DR
HydraViT introduces a scalable Vision Transformer architecture that stacks attention heads and dynamically adjusts embedded dimensions, enabling efficient deployment across diverse hardware with multiple subnetworks and improved accuracy.
Contribution
The paper proposes HydraViT, a novel method that creates multiple subnetworks within a single ViT by stacking attention heads and varying dimensions, enhancing scalability and adaptability.
Findings
Achieves up to 10 subnetworks for diverse hardware constraints.
Improves accuracy by up to 5 percentage points at the same GMACs.
Enhances throughput by up to 7 percentage points on ImageNet-1K.
Abstract
The architecture of Vision Transformers (ViTs), particularly the Multi-head Attention (MHA) mechanism, imposes substantial hardware demands. Deploying ViTs on devices with varying constraints, such as mobile phones, requires multiple models of different sizes. However, this approach has limitations, such as training and storing each required model separately. This paper introduces HydraViT, a novel approach that addresses these limitations by stacking attention heads to achieve a scalable ViT. By repeatedly changing the size of the embedded dimensions throughout each layer and their corresponding number of attention heads in MHA during training, HydraViT induces multiple subnetworks. Thereby, HydraViT achieves adaptability across a wide spectrum of hardware environments while maintaining performance. Our experimental results demonstrate the efficacy of HydraViT in achieving a scalable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModular Robots and Swarm Intelligence · Robotic Path Planning Algorithms
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention
