Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and   Algorithm Co-design

Hongxiang Fan; Thomas Chau; Stylianos I. Venieris; Royson Lee,; Alexandros Kouris; Wayne Luk; Nicholas D. Lane; Mohamed S. Abdelfattah

arXiv:2209.09570·cs.AR·September 21, 2022

Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

Hongxiang Fan, Thomas Chau, Stylianos I. Venieris, Royson Lee,, Alexandros Kouris, Wayne Luk, Nicholas D. Lane, Mohamed S. Abdelfattah

PDF

Open Access

TL;DR

This paper introduces FABNet, a hardware-friendly sparse variant of attention-based neural networks, and an adaptable FPGA accelerator that jointly optimizes algorithm and hardware for significant speedups and efficiency gains.

Contribution

It proposes a unified butterfly sparsity pattern for both attention and feed-forward networks, and a configurable hardware accelerator for improved performance and scalability.

Findings

01

FABNet matches vanilla Transformer accuracy with 10-66x less computation.

02

The FPGA accelerator achieves 14.2-23.2x speedup over state-of-the-art.

03

System is up to 273.8x faster than CPU/GPU on low-power devices.

Abstract

Attention-based neural networks have become pervasive in many AI tasks. Despite their excellent algorithmic performance, the use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources, which often compromises their hardware performance. Although various sparse variants have been introduced, most approaches only focus on mitigating the quadratic scaling of attention on the algorithm level, without explicitly considering the efficiency of mapping their methods on real hardware designs. Furthermore, most efforts only focus on either the attention mechanism or the FFNs but without jointly optimizing both parts, causing most of the current designs to lack scalability when dealing with different input lengths. This paper systematically considers the sparsity patterns in different variants from a hardware perspective. On the algorithmic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Image Enhancement Techniques · CCD and CMOS Imaging Sensors

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Dropout · Residual Connection · Dense Connections