STAS: Spatio-Temporal Adaptive Computation Time for Spiking Transformers
Donghwa Kang, Doohyun Kim, Sang-Ki Ko, Jinkyu Lee, Brent ByungHoon Kang, Hyeongboo Baek

TL;DR
This paper introduces STAS, a novel framework that enhances spiking Transformer models by co-designing architecture and adaptive computation, significantly reducing energy consumption and improving accuracy on standard vision datasets.
Contribution
STAS presents a unified approach combining static architecture design with dynamic computation policies for spiking Transformers, addressing prior fragmentation and improving efficiency.
Findings
Reduces energy consumption by up to 45.9% on CIFAR-10.
Improves accuracy over state-of-the-art models.
Validated on CIFAR-10, CIFAR-100, and ImageNet datasets.
Abstract
Spiking neural networks (SNNs) offer energy efficiency over artificial neural networks (ANNs) but suffer from high latency and computational overhead due to their multi-timestep operational nature. While various dynamic computation methods have been developed to mitigate this by targeting spatial, temporal, or architecture-specific redundancies, they remain fragmented. While the principles of adaptive computation time (ACT) offer a robust foundation for a unified approach, its application to SNN-based vision Transformers (ViTs) is hindered by two core issues: the violation of its temporal similarity prerequisite and a static architecture fundamentally unsuited for its principles. To address these challenges, we propose STAS (Spatio-Temporal Adaptive computation time for Spiking transformers), a framework that co-designs the static architecture and dynamic computation policy. STAS…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. This paper identify and solve a fundamental challenge in SNN-ViT models: since spike inputs vary across time steps, the model lacks temporal similarity, hindering the application of dynamic computation techniques such as ACT. 2. The design of the I-SPS module is both intuitive and effective, and the authors provide ablation studies that clearly validate its contribution. 3. The proposed method demonstrates impressive performance and efficiency across multiple datasets and model scales.
The A-SSA mechanism requires serial computation, accumulation, and checking of the halting score at each block and timestep. This serial checkpointing may introduce additional wall-clock latency, which should be examined through supplementary experiments.
1. The authors identify a valid gap—existing SNN-based Transformers perform redundant computation across time and tokens. 2. Combining early stopping (temporal) and token pruning (spatial) in one unified formulation.
1. The key insight---early exit or adaptive computation---is not new. It originates from Graves (2016, Adaptive Computation Time for Recurrent Neural Networks) and has since been explored in: 1) ACT and Adaptive Depth Transformers (Liu et al., DynamicViT, NeurIPS 2021; Rao et al., DynamicViT: Efficient Vision Transformers by Adaptive Token Sampling, NeurIPS 2021) 2) Token pruning and early exit in ViTs (Yu et al., FastFormer, NeurIPS 2021; Elbayad et al., Depth-Adaptive Transformer, ACL 2020) 3)
1. The authors correctly identify the fundamental obstacle to applying ACT directly in spiking Transformers and provide visualizations and similarity analysis to support their claims. 2. The designs of the two components, i.e., I-SPS and A-SSA, are reasonable. 3. The paper conducts systematic experiments on the commonly used datasets of the SNN community and compares them against various SNN methods. 4. The article is clear and easy to understand.
1. I-SPS compresses spike inputs from multiple time steps into a single representation. While this design can improve temporal similarity and reduce the temporal cost, it discards information about dynamic changes across time steps. Are there any other compression methods that are not so aggressive? Authors should try such methods. 2. The parameter studies about different timesteps should be further explored, especially for single timesteps like OST and Att MS ResNet. 3. Missing the peer competi
1. The analysis of dynamic computation–related content is relatively comprehensive. 2. Most of the research methods are explained clearly and directly through mathematical formulations. 3. The experimental analysis is also sufficiently detailed, occupying the majority of the paper’s content.
1. The writing order and the layout of figures and text in the paper are confusing. The Introduction repeatedly mentions the principle of Adaptive Computation Time (ACT) and seems to treat it as the theoretical foundation of the paper, yet the first actual explanation of what ACT is does not appear until the Related Works section. This leaves readers puzzled while reading the introduction. The layout and distribution of Figure 1(a) are also highly disorganized — the model distributions are overl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Neural Networks and Applications
