Row-wise Accelerator for Vision Transformer

Hong-Yi Wang; and Tian-Sheuan Chang

arXiv:2205.03998·cs.AR·May 10, 2022

Row-wise Accelerator for Vision Transformer

Hong-Yi Wang, and Tian-Sheuan Chang

PDF

Open Access

TL;DR

This paper introduces a row-wise hardware accelerator for vision transformers that enhances efficiency by decomposing operations into dot products and sharing weights, achieving high throughput with low resource usage.

Contribution

It proposes a novel row-wise scheduling hardware design for vision transformers, enabling efficient execution and reduced memory usage.

Findings

01

Achieves 403.2 GOPS throughput at 600MHz

02

Uses only 262K gates and 149KB SRAM buffer

03

Demonstrates efficient hardware implementation in 40nm CMOS

Abstract

Following the success of the natural language processing, the transformer for vision applications has attracted significant attention in recent years due to its excellent performance. However, existing deep learning hardware accelerators for vision cannot execute this structure efficiently due to significant model architecture differences. As a result, this paper proposes the hardware accelerator for vision transformers with row-wise scheduling, which decomposes major operations in vision transformers as a single dot product primitive for a unified and efficient execution. Furthermore, by sharing weights in columns, we can reuse the data and reduce the usage of memory. The implementation with TSMC 40nm CMOS technology only requires 262K gate count and 149KB SRAM buffer for 403.2 GOPS throughput at 600MHz clock frequency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Image and Video Retrieval Techniques · Infrared Target Detection Methodologies