Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design
Hongxiang Fan, Thomas Chau, Stylianos I. Venieris, Royson Lee,, Alexandros Kouris, Wayne Luk, Nicholas D. Lane, Mohamed S. Abdelfattah

TL;DR
This paper introduces FABNet, a hardware-friendly sparse variant of attention-based neural networks, and an adaptable FPGA accelerator that jointly optimizes algorithm and hardware for significant speedups and efficiency gains.
Contribution
It proposes a unified butterfly sparsity pattern for both attention and feed-forward networks, and a configurable hardware accelerator for improved performance and scalability.
Findings
FABNet matches vanilla Transformer accuracy with 10-66x less computation.
The FPGA accelerator achieves 14.2-23.2x speedup over state-of-the-art.
System is up to 273.8x faster than CPU/GPU on low-power devices.
Abstract
Attention-based neural networks have become pervasive in many AI tasks. Despite their excellent algorithmic performance, the use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources, which often compromises their hardware performance. Although various sparse variants have been introduced, most approaches only focus on mitigating the quadratic scaling of attention on the algorithm level, without explicitly considering the efficiency of mapping their methods on real hardware designs. Furthermore, most efforts only focus on either the attention mechanism or the FFNs but without jointly optimizing both parts, causing most of the current designs to lack scalability when dealing with different input lengths. This paper systematically considers the sparsity patterns in different variants from a hardware perspective. On the algorithmic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Image Enhancement Techniques · CCD and CMOS Imaging Sensors
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Dropout · Residual Connection · Dense Connections
