Combining Aggregated Attention and Transformer Architecture for Accurate and Efficient Performance of Spiking Neural Networks
Hangming Zhang, Alexander Sboev, Roman Rybka, Qiang Yu

TL;DR
This paper introduces SAFormer, a novel architecture that combines the low-power benefits of Spiking Neural Networks with the high performance of Transformers through a simplified attention mechanism and enhanced feature extraction.
Contribution
The paper proposes the Spike Aggregated Self-Attention mechanism and a Depthwise Convolution Module to effectively integrate SNNs with Transformers, reducing energy consumption while improving accuracy.
Findings
SAFormer outperforms state-of-the-art SNNs in accuracy
SAFormer reduces energy consumption significantly
The model demonstrates effective low-power, high-performance capabilities
Abstract
Spiking Neural Networks have attracted significant attention in recent years due to their distinctive low-power characteristics. Meanwhile, Transformer models, known for their powerful self-attention mechanisms and parallel processing capabilities, have demonstrated exceptional performance across various domains, including natural language processing and computer vision. Despite the significant advantages of both SNNs and Transformers, directly combining the low-power benefits of SNNs with the high performance of Transformers remains challenging. Specifically, while the sparse computing mode of SNNs contributes to reduced energy consumption, traditional attention mechanisms depend on dense matrix computations and complex softmax operations. This reliance poses significant challenges for effective execution in low-power scenarios. Given the tremendous success of Transformers in deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Neural Networks and Applications · Neural dynamics and brain function
MethodsLinear Layer · Dropout · Convolution · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax
