Combining Aggregated Attention and Transformer Architecture for Accurate   and Efficient Performance of Spiking Neural Networks

Hangming Zhang; Alexander Sboev; Roman Rybka; Qiang Yu

arXiv:2412.13553·cs.NE·December 19, 2024

Combining Aggregated Attention and Transformer Architecture for Accurate and Efficient Performance of Spiking Neural Networks

Hangming Zhang, Alexander Sboev, Roman Rybka, Qiang Yu

PDF

Open Access

TL;DR

This paper introduces SAFormer, a novel architecture that combines the low-power benefits of Spiking Neural Networks with the high performance of Transformers through a simplified attention mechanism and enhanced feature extraction.

Contribution

The paper proposes the Spike Aggregated Self-Attention mechanism and a Depthwise Convolution Module to effectively integrate SNNs with Transformers, reducing energy consumption while improving accuracy.

Findings

01

SAFormer outperforms state-of-the-art SNNs in accuracy

02

SAFormer reduces energy consumption significantly

03

The model demonstrates effective low-power, high-performance capabilities

Abstract

Spiking Neural Networks have attracted significant attention in recent years due to their distinctive low-power characteristics. Meanwhile, Transformer models, known for their powerful self-attention mechanisms and parallel processing capabilities, have demonstrated exceptional performance across various domains, including natural language processing and computer vision. Despite the significant advantages of both SNNs and Transformers, directly combining the low-power benefits of SNNs with the high performance of Transformers remains challenging. Specifically, while the sparse computing mode of SNNs contributes to reduced energy consumption, traditional attention mechanisms depend on dense matrix computations and complex softmax operations. This reliance poses significant challenges for effective execution in low-power scenarios. Given the tremendous success of Transformers in deep…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Applications · Neural dynamics and brain function

MethodsLinear Layer · Dropout · Convolution · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax