SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in Spiking Transformer
Yue Liu, Shanlin Xiao, Bo Li, Zhiyi Yu

TL;DR
SparseSpikformer introduces a co-design framework that combines token and weight pruning to significantly reduce model size and computational cost in Spiking Transformers, while maintaining high performance.
Contribution
The paper proposes a novel co-design framework for Spikformer that leverages the Lottery Ticket Hypothesis and a token selector to achieve over 90% sparsity with minimal accuracy loss.
Findings
Achieves over 90% sparsity in model parameters.
Reduces GFLOPs by 20% without accuracy degradation.
Maintains competitive performance with a highly sparse model.
Abstract
As the third-generation neural network, the Spiking Neural Network (SNN) has the advantages of low power consumption and high energy efficiency, making it suitable for implementation on edge devices. More recently, the most advanced SNN, Spikformer, combines the self-attention module from Transformer with SNN to achieve remarkable performance. However, it adopts larger channel dimensions in MLP layers, leading to an increased number of redundant model parameters. To effectively decrease the computational complexity and weight parameters of the model, we explore the Lottery Ticket Hypothesis (LTH) and discover a very sparse (90%) subnetwork that achieves comparable performance to the original network. Furthermore, we also design a lightweight token selector module, which can remove unimportant background information from images based on the average spike firing rate of neurons,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Ferroelectric and Negative Capacitance Devices
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Softmax · Position-Wise Feed-Forward Layer · Label Smoothing · Dense Connections · Absolute Position Encodings · Spiking Neural Networks
