TP-Spikformer: Token Pruned Spiking Transformer
Wenjie Wei, Xiaolong Zhou, Malu Zhang, Ammar Belatreche, Qian Sun, Yimeng Shan, Dehao Zhang, Zijian Zhou, Zeyu Ma, Yang Yang, Haizhou Li

TL;DR
This paper introduces TP-Spikformer, a token pruning method for spiking transformers that reduces computational costs while maintaining performance, enabling more efficient deployment of energy-efficient spiking neural networks.
Contribution
The paper proposes a novel token pruning framework with a heuristic importance criterion and early stopping strategy, improving efficiency without sacrificing accuracy in spiking transformers.
Findings
Effective reduction in computational overhead across multiple architectures.
Maintains competitive accuracy in diverse tasks.
Performs well in a training-free manner.
Abstract
Spiking neural networks (SNNs) offer an energy-efficient alternative to traditional neural networks due to their event-driven computing paradigm. However, recent advancements in spiking transformers have focused on improving accuracy with large-scale architectures, which require significant computational resources and limit deployment on resource-constrained devices. In this paper, we propose a simple yet effective token pruning method for spiking transformers, termed TP-Spikformer, that reduces storage and computational overhead while maintaining competitive performance. Specifically, we first introduce a heuristic spatiotemporal information-retaining criterion that comprehensively evaluates tokens' importance, assigning higher scores to informative tokens for retention and lower scores to uninformative ones for pruning. Based on this criterion, we propose an information-retaining…
Peer Reviews
Decision·ICLR 2026 Poster
1. The narrative is very convincing that token-level sparsification is a very effective way for transformer-style models to boost computational efficiency. 2. This paper's experiments are very solid and extensive. The token sparsification method is verified across different tasks and multiple datasets, which makes the effectiveness very convincing. 3. This work demonstrates the most advanced token pruning results for spiking ViT in comparison with previous spiking token pruning methods.
The substantive contributions of this paper significantly overlap with previous work on ANN transformer sparsification, demonstrating limited novelty. It essentially replicates the success of existing ANN transformer sparsification approaches, with even the narrative framework bearing striking resemblances. 1. The spatial token scorer is highly similar to the ANN token-pruning one [1]. Thus, the novelty of this paper is significantly challenged. 2. The ablation study in Table 6 is conducted on
This paper presents a robust and highly impactful contribution, demonstrating exceptional strengths across originality, quality, clarity, and significance. The work's originality is outstanding, introducing a novel, training-free token pruning framework for spiking transformers, which stands in stark contrast to prior methods that require costly retraining and architectural modifications. The core ideas are creative and well-motivated: the bio-inspired IRToP criterion offers a new heuristic for
1. The paper would be strengthened by providing further experimental results, such as from an entropy or visualization perspective, to more rigorously justify the effectiveness of the Spatial and Temporal token scorers in measuring token importance. 2. The authors state that TP-Spikformer performs well even without training, but its accuracy drops significantly when using QKFormer and SDT-V3.
1. This paper proposed an information retaining token pruning framework for spiking transformers. 2. The writing in this paper is good. 3. The experiments are quite extensive, including classification, segmentation, detection, and tracking tasks.
1. My main concern lies in the motivation. Currently, directly trained spiking Transformers are relatively small- or medium-scale models. Although pruning slightly reduces performance while improving throughput(eg, TP-Spikformer with SDT-V1-8-768 on imagenet: -1.53% acc, thr 29%), this trade-off does not constitute a strong motivation. 2、There is a lack of discussion on overall model training costs, such as training time and memory consumption. 3、There is a lack of spike-driven characteristics
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Neural Networks and Reservoir Computing
