Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks
Jieyuan Zhang, Xiaolong Zhou, Shuai Wang, Wenjie Wei, Hanwen Liu, Qian Sun, Malu Zhang, Yang Yang, Haizhou Li

TL;DR
This paper introduces the ST-ERF framework to analyze the spatial-temporal receptive fields of SNNs, revealing limitations in global feature modeling and proposing architectures to enhance performance in visual long-sequence tasks.
Contribution
The paper proposes the ST-ERF analysis tool and two novel architectures, MLPixer and SRB, to improve the global spatial ERF in Transformer-based SNNs, advancing their performance.
Findings
ST-ERF reveals limited global receptive fields in current SNNs.
Proposed architectures enhance global ERF and improve task performance.
Experiments validate the effectiveness of the methods across multiple tasks.
Abstract
Spiking Neural Networks (SNNs) demonstrate significant potential for energy-efficient neuromorphic computing through an event-driven paradigm. While training methods and computational models have greatly advanced, SNNs struggle to achieve competitive performance in visual long-sequence modeling tasks. In artificial neural networks, the effective receptive field (ERF) serves as a valuable tool for analyzing feature extraction capabilities in visual long-sequence modeling. Inspired by this, we introduce the Spatio-Temporal Effective Receptive Field (ST-ERF) to analyze the ERF distributions across various Transformer-based SNNs. Based on the proposed ST-ERF, we reveal that these models suffer from establishing a robust global ST-ERF, thereby limiting their visual feature modeling capabilities. To overcome this issue, we propose two novel channel-mixer architectures:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
