Spikformer: When Spiking Neural Network Meets Transformer
Zhaokun Zhou, Yuesheng Zhu, Chao He, Yaowei Wang, Shuicheng Yan,, Yonghong Tian, Li Yuan

TL;DR
This paper introduces Spikformer, a novel spiking transformer model that combines self-attention with biological plausibility, achieving state-of-the-art accuracy in image classification with low energy consumption.
Contribution
It proposes Spiking Self Attention (SSA) and the Spikformer framework, integrating self-attention into SNNs for improved efficiency and performance.
Findings
Spikformer outperforms existing SNN frameworks on image classification.
Achieves 74.81% top-1 accuracy on ImageNet with 66.3M parameters.
SSA mechanism is efficient, sparse, and avoids multiplication, reducing energy consumption.
Abstract
We consider two biologically plausible structures, the Spiking Neural Network (SNN) and the self-attention mechanism. The former offers an energy-efficient and event-driven paradigm for deep learning, while the latter has the ability to capture feature dependencies, enabling Transformer to achieve good performance. It is intuitively promising to explore the marriage between them. In this paper, we consider leveraging both self-attention capability and biological properties of SNNs, and propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer). The SSA mechanism in Spikformer models the sparse visual feature by using spike-form Query, Key, and Value without softmax. Since its computation is sparse and avoids multiplication, SSA is efficient and has low computational energy consumption. It is shown that Spikformer with SSA can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Neural dynamics and brain function
MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Multi-Head Attention · Adam · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Layer Normalization
