Spiking Transformer with Spatial-Temporal Attention
Donghyun Lee, Yuhang Li, Youngeun Kim, Shiting Xiao, Priyadarshini, Panda

TL;DR
This paper introduces STAtten, a novel spike-based transformer architecture that effectively captures both spatial and temporal dependencies, significantly enhancing performance on various static and neuromorphic datasets while maintaining computational efficiency.
Contribution
STAtten is a simple, efficient architecture that integrates spatial-temporal attention into spike-based transformers without requiring major redesigns.
Findings
Improves accuracy on CIFAR10/100, ImageNet, CIFAR10-DVS, and N-Caltech101 datasets.
Maintains computational complexity comparable to spatial-only models.
Demonstrates significant performance gains through extensive experiments.
Abstract
Spike-based Transformer presents a compelling and energy-efficient alternative to traditional Artificial Neural Network (ANN)-based Transformers, achieving impressive results through sparse binary computations. However, existing spike-based transformers predominantly focus on spatial attention while neglecting crucial temporal dependencies inherent in spike-based processing, leading to suboptimal feature representation and limited performance. To address this limitation, we propose Spiking Transformer with Spatial-Temporal Attention (STAtten), a simple and straightforward architecture that efficiently integrates both spatial and temporal information in the self-attention mechanism. STAtten introduces a block-wise computation strategy that processes information in spatial-temporal chunks, enabling comprehensive feature capture while maintaining the same computational complexity as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Neural dynamics and brain function · Tactile and Sensory Interactions
MethodsAttention Is All You Need · Focus · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing
