An Efficient Sparse Hardware Accelerator for Spike-Driven Transformer
Zhengke Li, Wendong Mao, Siyu Zhang, Qiwei Dong, Zhongfeng Wang

TL;DR
This paper introduces a novel sparse hardware accelerator for Spike-driven Transformers that leverages spike sparsity to reduce computations, power, and latency, achieving significant improvements over existing SNN accelerators.
Contribution
It presents a new encoding method and specialized modules for spike-driven self-attention, enabling efficient processing of sparse spikes in Transformer models.
Findings
Up to 13.24× throughput improvement
Up to 1.33× energy efficiency gain
Effective exploitation of spike sparsity in hardware
Abstract
Recently, large models, such as Vision Transformer and BERT, have garnered significant attention due to their exceptional performance. However, their extensive computational requirements lead to considerable power and hardware resource consumption. Brain-inspired computing, characterized by its spike-driven methods, has emerged as a promising approach for low-power hardware implementation. In this paper, we propose an efficient sparse hardware accelerator for Spike-driven Transformer. We first design a novel encoding method that encodes the position information of valid activations and skips non-spike values. This method enables us to use encoded spikes for executing the calculations of linear, maxpooling and spike-driven self-attention. Compared with the single spike input design of conventional SNN accelerators that primarily focus on convolution-based spiking computations, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemiconductor Lasers and Optical Devices · Photonic and Optical Devices · Advanced MEMS and NEMS Technologies
