SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers

Alberto Marchisio; Davide Dura; Maurizio Capra; Maurizio; Martina; Guido Masera; Muhammad Shafique

arXiv:2304.03986·cs.LG·April 26, 2023·1 cites

SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers

Alberto Marchisio, Davide Dura, Maurizio Capra, Maurizio, Martina, Guido Masera, Muhammad Shafique

PDF

Open Access 1 Repo

TL;DR

SwiftTron is a specialized hardware accelerator optimized for quantized Transformer models, enabling efficient deployment on resource-constrained devices by supporting key operations with low power and area footprint.

Contribution

We introduce SwiftTron, a novel hardware accelerator tailored for quantized Transformers, supporting multiple operations and designed for efficient ASIC implementation.

Findings

01

Executes RoBERTa-base in 1.83 ns

02

Consumes 33.64 mW power

03

Occupies 273 mm^2 area

Abstract

Transformers' compute-intensive operations pose enormous challenges for their deployment in resource-constrained EdgeAI / tinyML devices. As an established neural network compression technique, quantization reduces the hardware computational and memory resources. In particular, fixed-point quantization is desirable to ease the computations using lightweight blocks, like adders and multipliers, of the underlying hardware. However, deploying fully-quantized Transformers on existing general-purpose hardware, generic AI accelerators, or specialized architectures for Transformers with floating-point units might be infeasible and/or inefficient. Towards this, we propose SwiftTron, an efficient specialized hardware accelerator designed for Quantized Transformers. SwiftTron supports the execution of different types of Transformers' operations (like Attention, Softmax, GELU, and Layer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

albertomarchisio/swifttron
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsSoftmax