Neural Dynamics Self-Attention for Spiking Transformers

Dehao Zhang; Fukai Guo; Shuai Wang; Jingya Wang; Jieyuan Zhang; Yimeng Shan; Malu Zhang; Yang Yang; Haizhou Li

arXiv:2603.19290·cs.NE·March 23, 2026

Neural Dynamics Self-Attention for Spiking Transformers

Dehao Zhang, Fukai Guo, Shuai Wang, Jingya Wang, Jieyuan Zhang, Yimeng Shan, Malu Zhang, Yang Yang, Haizhou Li

PDF

Open Access 3 Reviews

TL;DR

This paper introduces LRF-Dyn, a novel spiking self-attention mechanism that enhances local modeling and reduces memory usage in Spiking Transformers, leading to improved performance and energy efficiency for edge vision tasks.

Contribution

The paper proposes LRF-Dyn, a biologically inspired localized receptive field approach that approximates attention, eliminating large matrix storage and boosting performance in Spiking Transformers.

Findings

01

Reduces memory overhead during inference.

02

Improves performance on visual tasks.

03

Enhances energy efficiency for edge applications.

Abstract

Integrating Spiking Neural Networks (SNNs) with Transformer architectures offers a promising pathway to balance energy efficiency and performance, particularly for edge vision applications. However, existing Spiking Transformers face two critical challenges: (i) a substantial performance gap compared to their Artificial Neural Networks (ANNs) counterparts and (ii) high memory overhead during inference. Through theoretical analysis, we attribute both limitations to the Spiking Self-Attention (SSA) mechanism: the lack of locality bias and the need to store large attention matrices. Inspired by the localized receptive fields (LRF) and membrane-potential dynamics of biological visual neurons, we propose LRF-Dyn, which uses spiking neurons with localized receptive fields to compute attention while reducing memory requirements. Specifically, we introduce a LRF method into SSA to assign higher…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 5

Strengths

This paper demonstrates significant strengths across all key dimensions. Its originality is high, stemming from a novel problem analysis that pinpoints the lack of local modeling and high memory cost in Spiking Self Attention (SSA) as key barriers . The proposed LRF-Dyn method is a creative and well-motivated solution, uniquely integrating a Localized Receptive Field (LRF) mechanism to enhance local feature capture and, more innovatively, reformulating the attention mechanism to mimic neuronal d

Weaknesses

1. To substantiate the claimed benefits of the proposed method, such as the reduction in memory overhead, the authors should provide concrete quantitative data, for instance, measurements of energy consumption and inference latency. 2. To further justify the necessity of multi-dendritic neurons, the authors are encouraged to include comparisons under identical conditions with other spiking neuron (e.g., LIF, ALIF or DH-LIF) that serve as the core neurons in LRF-Dyn. 3. Given that LRF-Dyn exhibit

Reviewer 02Rating 6Confidence 5

Strengths

1. The paper identifies key limitations in existing SNN-Transformer architectures and substantiates these claims through comprehensive empirical observations and quantitative analyses, making the motivation convincing. 2. Extensive experiments are conducted on recent state-of-the-art Transformer architectures, showing consistent and notable improvements in both image classification and segmentation tasks. 3. The authors provide an analysis of the local receptive field degradation in SNNs from bo

Weaknesses

1. The adoption of multi-dendritic neurons increases training complexity, potentially leading to higher computational costs on large-scale datasets such as ImageNet. 2. The presentation of inference memory requirements could be improved. Providing a tabular summary for LRF-Attn and LRF-Dyn would make the comparison clearer and more intuitive.

Reviewer 03Rating 2Confidence 5

Strengths

1. Theoretical analyses are provided to support the proposed LRF-SSA mechanism. 2. The proposed LRF-Dyn maintains performance comparable to LRF-SSA while reducing memory requirements during inference.

Weaknesses

1. The motivation behind LRF-SSA requires further validation. The paper points out a mismatch in attention patterns between SSA and VSA: as shown in Fig. 1(a) and Fig. 2, VSA emphasizes local relations, while SSA exhibits unfocused global attention. However, this lack of local attention in SSA is a result of its inherent limitations rather than the root cause of its shortcomings. Therefore, enhancing local attention may not fundamentally resolve the issue. In fact, this issue stems from informat

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Neural dynamics and brain function