Breaking Global Self-Attention Bottlenecks in Transformer-based Spiking Neural Networks with Local Structure-Aware Self-Attention
Lingdong Li, Hangming Zhang, Qiang Yu

TL;DR
This paper introduces LSFormer, a novel Transformer-based Spiking Neural Network that uses local structure-aware self-attention and spiking response pooling to improve efficiency and accuracy in vision tasks.
Contribution
It proposes LSFormer, incorporating local dilated window mechanisms and new pooling/self-attention modules, addressing limitations of existing Transformer-based SNNs.
Findings
Achieves state-of-the-art performance on Tiny-ImageNet and N-CALTECH101 datasets.
Outperforms existing Transformer-based SNNs by 4.3% and 8.6% in top-1 accuracy.
Demonstrates potential for energy-efficient large-scale vision applications.
Abstract
Transformer-based Spiking Neural Networks (SNNs) integrate SNNs with global self-attention and have demonstrated impressive performance. However, existing Transformer-based SNNs suffer from two fundamental limitations. First, they typically employ max pooling layers to reduce the size of feature maps, but the max pooling captures only the strongest response and fails to comprehensively preserve representative regional features. Second, the global self-attention involves all global feature interactions, resulting in computational redundancy and quadratic computational complexity, thus conflicting with the sparse and energy-efficient characteristics of SNNs. To address these challenges, we develop Local Structure-Aware Spiking Transformer (LSFormer), a novel Transformer-based Spiking Neural Network that incorporates Spiking Response Pooling (SPooling) and Local Structure-Aware Spiking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
