ViTA: A Vision Transformer Inference Accelerator for Edge Applications

Shashank Nag; Gourav Datta; Souvik Kundu; Nitin Chandrachoodan; Peter; A. Beerel

arXiv:2302.09108·cs.AR·September 13, 2023·1 cites

ViTA: A Vision Transformer Inference Accelerator for Edge Applications

Shashank Nag, Gourav Datta, Souvik Kundu, Nitin Chandrachoodan, Peter, A. Beerel

PDF

Open Access

TL;DR

ViTA is a configurable hardware accelerator designed for efficient inference of vision transformer models on resource-constrained edge devices, achieving high utilization, low power consumption, and supporting multiple models with minimal modifications.

Contribution

This paper introduces ViTA, a novel hardware accelerator optimized for vision transformers on edge devices, with innovative pipeline and inter-layer MLP optimizations.

Findings

01

Achieves nearly 90% hardware utilization efficiency

02

Consumes only 0.88W power at 150 MHz

03

Supports multiple vision transformer models with minimal control logic changes

Abstract

Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer, have recently gained significant traction in computer vision tasks due to their ability to capture the global relation between features which leads to superior performance. However, they are compute-heavy and difficult to deploy in resource-constrained edge devices. Existing hardware accelerators, including those for the closely-related BERT transformer models, do not target highly resource-constrained environments. In this paper, we address this gap and propose ViTA - a configurable hardware accelerator for inference of vision transformer models, targeting resource-constrained edge computing devices and avoiding repeated off-chip memory accesses. We employ a head-level pipeline and inter-layer MLP optimizations, and can support several commonly used vision transformer models with changes solely in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Image and Video Retrieval Techniques · Advanced Memory and Neural Computing

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Dropout · Dropout · Byte Pair Encoding · Linear Warmup With Linear Decay · Residual Connection