Enabling Efficient Hardware Acceleration of Hybrid Vision Transformer (ViT) Networks at the Edge
Joren Dumoulin, Pouya Houshmand, Vikram Jain, Marian Verhelst

TL;DR
This paper presents a hardware accelerator design for hybrid vision transformer networks, optimizing execution on resource-limited edge devices through configurable processing elements and advanced scheduling techniques.
Contribution
It introduces a configurable PE array and novel scheduling strategies to efficiently support diverse hybrid ViT layers on edge hardware.
Findings
Achieved 1.39 TOPS/W energy efficiency in 28nm CMOS implementation.
Supported all hybrid ViT layer types with a configurable PE array.
Reduced off-chip memory transfers through layer fusion and optimized scheduling.
Abstract
Hybrid vision transformers combine the elements of conventional neural networks (NN) and vision transformers (ViT) to enable lightweight and accurate detection. However, several challenges remain for their efficient deployment on resource-constrained edge devices. The hybrid models suffer from a widely diverse set of NN layer types and large intermediate data tensors, hampering efficient hardware acceleration. To enable their execution at the edge, this paper proposes innovations across the hardware-scheduling stack: a.) At the lowest level, a configurable PE array supports all hybrid ViT layer types; b.) temporal loop re-ordering within one layer, enabling hardware support for normalization and softmax layers, minimizing on-chip data transfers; c.) further scheduling optimization employs layer fusion across inverted bottleneck layers to drastically reduce off-chip memory transfers. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
