Masked Spiking Transformer
Ziqing Wang, Yuetong Fang, Jiahang Cao, Qiang Zhang, Zhongrui Wang,, Renjing Xu

TL;DR
This paper introduces the Masked Spiking Transformer (MST), a novel framework that combines ANN-to-SNN conversion with spike masking to enhance energy efficiency and performance in spiking neural networks.
Contribution
It proposes a new MST framework with RSM for spike pruning, achieving significant energy savings without performance loss.
Findings
26.8% power reduction at 75% masking ratio
Maintains performance comparable to unmasked models
Improves energy efficiency in SNNs
Abstract
The combination of Spiking Neural Networks (SNNs) and Transformers has attracted significant attention due to their potential for high energy efficiency and high-performance nature. However, existing works on this topic typically rely on direct training, which can lead to suboptimal performance. To address this issue, we propose to leverage the benefits of the ANN-to-SNN conversion method to combine SNNs and Transformers, resulting in significantly improved performance over existing state-of-the-art SNN models. Furthermore, inspired by the quantal synaptic failures observed in the nervous system, which reduces the number of spikes transmitted across synapses, we introduce a novel Masked Spiking Transformer (MST) framework that incorporates a Random Spike Masking (RSM) method to prune redundant spikes and reduce energy consumption without sacrificing performance. Our experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Masked Spiking Transformer· youtube
Taxonomy
TopicsAdvanced Memory and Neural Computing · Neural dynamics and brain function · Ferroelectric and Negative Capacitance Devices
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Adam · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Layer Normalization · Residual Connection
