Ge$^\text{2}$mS-T: Multi-Dimensional Grouping for Ultra-High Energy Efficiency in Spiking Transformer
Zecheng Hao, Shenghao Xie, Kang Chen, Wenxuan Liu, Zhaofei Yu, Tiejun Huang

TL;DR
This paper introduces Ge$^2$mS-T, a novel multi-dimensional grouped computation architecture for Spiking Vision Transformers that significantly improves energy efficiency, accuracy, and memory usage.
Contribution
It proposes a new architecture with grouped computation across multiple dimensions, including a lossless conversion model and a multi-scale attention mechanism, addressing key limitations in SNNs.
Findings
Achieves superior performance on challenging benchmarks.
Demonstrates ultra-high energy efficiency compared to existing methods.
First systematic approach to multi-dimensional grouping in S-ViTs.
Abstract
Spiking Neural Networks (SNNs) offer superior energy efficiency over Artificial Neural Networks (ANNs). However, they encounter significant deficiencies in training and inference metrics when applied to Spiking Vision Transformers (S-ViTs). Existing paradigms including ANN-SNN Conversion and Spatial-Temporal Backpropagation (STBP) suffer from inherent limitations, precluding concurrent optimization of memory, accuracy and energy consumption. To address these issues, we propose GemS-T, a novel architecture implementing grouped computation across temporal, spatial and network structure dimensions. Specifically, we introduce the Grouped-Exponential-Coding-based IF (ExpG-IF) model, enabling lossless conversion with constant training overhead and precise regulation for spike patterns. Additionally, we develop Group-wise Spiking Self-Attention (GW-SSA) to reduce computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
