ViTA: A Vision Transformer Inference Accelerator for Edge Applications
Shashank Nag, Gourav Datta, Souvik Kundu, Nitin Chandrachoodan, Peter, A. Beerel

TL;DR
ViTA is a configurable hardware accelerator designed for efficient inference of vision transformer models on resource-constrained edge devices, achieving high utilization, low power consumption, and supporting multiple models with minimal modifications.
Contribution
This paper introduces ViTA, a novel hardware accelerator optimized for vision transformers on edge devices, with innovative pipeline and inter-layer MLP optimizations.
Findings
Achieves nearly 90% hardware utilization efficiency
Consumes only 0.88W power at 150 MHz
Supports multiple vision transformer models with minimal control logic changes
Abstract
Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer, have recently gained significant traction in computer vision tasks due to their ability to capture the global relation between features which leads to superior performance. However, they are compute-heavy and difficult to deploy in resource-constrained edge devices. Existing hardware accelerators, including those for the closely-related BERT transformer models, do not target highly resource-constrained environments. In this paper, we address this gap and propose ViTA - a configurable hardware accelerator for inference of vision transformer models, targeting resource-constrained edge computing devices and avoiding repeated off-chip memory accesses. We employ a head-level pipeline and inter-layer MLP optimizations, and can support several commonly used vision transformer models with changes solely in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Image and Video Retrieval Techniques · Advanced Memory and Neural Computing
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Dropout · Dropout · Byte Pair Encoding · Linear Warmup With Linear Decay · Residual Connection
