STAS: Spatio-Temporal Adaptive Computation Time for Spiking Transformers

Donghwa Kang; Doohyun Kim; Sang-Ki Ko; Jinkyu Lee; Brent ByungHoon Kang; Hyeongboo Baek

arXiv:2508.14138·cs.LG·August 21, 2025

STAS: Spatio-Temporal Adaptive Computation Time for Spiking Transformers

Donghwa Kang, Doohyun Kim, Sang-Ki Ko, Jinkyu Lee, Brent ByungHoon Kang, Hyeongboo Baek

PDF

Open Access 4 Reviews

TL;DR

This paper introduces STAS, a novel framework that enhances spiking Transformer models by co-designing architecture and adaptive computation, significantly reducing energy consumption and improving accuracy on standard vision datasets.

Contribution

STAS presents a unified approach combining static architecture design with dynamic computation policies for spiking Transformers, addressing prior fragmentation and improving efficiency.

Findings

01

Reduces energy consumption by up to 45.9% on CIFAR-10.

02

Improves accuracy over state-of-the-art models.

03

Validated on CIFAR-10, CIFAR-100, and ImageNet datasets.

Abstract

Spiking neural networks (SNNs) offer energy efficiency over artificial neural networks (ANNs) but suffer from high latency and computational overhead due to their multi-timestep operational nature. While various dynamic computation methods have been developed to mitigate this by targeting spatial, temporal, or architecture-specific redundancies, they remain fragmented. While the principles of adaptive computation time (ACT) offer a robust foundation for a unified approach, its application to SNN-based vision Transformers (ViTs) is hindered by two core issues: the violation of its temporal similarity prerequisite and a static architecture fundamentally unsuited for its principles. To address these challenges, we propose STAS (Spatio-Temporal Adaptive computation time for Spiking transformers), a framework that co-designs the static architecture and dynamic computation policy. STAS…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 3

Strengths

1. This paper identify and solve a fundamental challenge in SNN-ViT models: since spike inputs vary across time steps, the model lacks temporal similarity, hindering the application of dynamic computation techniques such as ACT. 2. The design of the I-SPS module is both intuitive and effective, and the authors provide ablation studies that clearly validate its contribution. 3. The proposed method demonstrates impressive performance and efficiency across multiple datasets and model scales.

Weaknesses

The A-SSA mechanism requires serial computation, accumulation, and checking of the halting score at each block and timestep. This serial checkpointing may introduce additional wall-clock latency, which should be examined through supplementary experiments.

Reviewer 02Rating 2Confidence 4

Strengths

1. The authors identify a valid gap—existing SNN-based Transformers perform redundant computation across time and tokens. 2. Combining early stopping (temporal) and token pruning (spatial) in one unified formulation.

Weaknesses

1. The key insight---early exit or adaptive computation---is not new. It originates from Graves (2016, Adaptive Computation Time for Recurrent Neural Networks) and has since been explored in: 1) ACT and Adaptive Depth Transformers (Liu et al., DynamicViT, NeurIPS 2021; Rao et al., DynamicViT: Efficient Vision Transformers by Adaptive Token Sampling, NeurIPS 2021) 2) Token pruning and early exit in ViTs (Yu et al., FastFormer, NeurIPS 2021; Elbayad et al., Depth-Adaptive Transformer, ACL 2020) 3)

Reviewer 03Rating 6Confidence 5

Strengths

1. The authors correctly identify the fundamental obstacle to applying ACT directly in spiking Transformers and provide visualizations and similarity analysis to support their claims. 2. The designs of the two components, i.e., I-SPS and A-SSA, are reasonable. 3. The paper conducts systematic experiments on the commonly used datasets of the SNN community and compares them against various SNN methods. 4. The article is clear and easy to understand.

Weaknesses

1. I-SPS compresses spike inputs from multiple time steps into a single representation. While this design can improve temporal similarity and reduce the temporal cost, it discards information about dynamic changes across time steps. Are there any other compression methods that are not so aggressive? Authors should try such methods. 2. The parameter studies about different timesteps should be further explored, especially for single timesteps like OST and Att MS ResNet. 3. Missing the peer competi

Reviewer 04Rating 2Confidence 4

Strengths

1. The analysis of dynamic computation–related content is relatively comprehensive. 2. Most of the research methods are explained clearly and directly through mathematical formulations. 3. The experimental analysis is also sufficiently detailed, occupying the majority of the paper’s content.

Weaknesses

1. The writing order and the layout of figures and text in the paper are confusing. The Introduction repeatedly mentions the principle of Adaptive Computation Time (ACT) and seems to treat it as the theoretical foundation of the paper, yet the first actual explanation of what ACT is does not appear until the Related Works section. This leaves readers puzzled while reading the introduction. The layout and distribution of Figure 1(a) are also highly disorganized — the model distributions are overl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Neural Networks and Applications