Fire on Motion: Optimizing Video Pass-bands for Efficient Spiking Action Recognition
Shuhan Ye, Yuanbin Qian, Yi Yu, Chong Wang, Yuqi Xie, Jiazhen Xu, Kun Wang, Xudong Jiang

TL;DR
This paper identifies a pass-band mismatch in spiking neural networks that limits their performance on dynamic video tasks and proposes a lightweight optimizer to enhance motion sensitivity, significantly improving results on action recognition benchmarks.
Contribution
The paper introduces the Pass-Bands Optimizer (PBO), a novel, lightweight module that enhances SNNs' ability to focus on motion-relevant information in videos without architectural changes.
Findings
PBO improves UCF101 accuracy by over 10 percentage points.
PBO achieves consistent gains in multi-modal action recognition.
PBO enhances weakly supervised video anomaly detection.
Abstract
Spiking neural networks (SNNs) have gained traction in vision due to their energy efficiency, bio-plausibility, and inherent temporal processing. Yet, despite this temporal capacity, most progress concentrates on static image benchmarks, and SNNs still underperform on dynamic video tasks compared to artificial neural networks (ANNs). In this work, we diagnose a fundamental pass-band mismatch: Standard spiking dynamics behave as a temporal low pass that emphasizes static content while attenuating motion bearing bands, where task relevant information concentrates in dynamic tasks. This phenomenon explains why SNNs can approach ANNs on static tasks yet fall behind on tasks that demand richer temporal understanding.To remedy this, we propose the Pass-Bands Optimizer (PBO), a plug-and-play module that optimizes the temporal pass-band toward task-relevant motion bands. PBO introduces only two…
Peer Reviews
Decision·Submitted to ICLR 2026
The proposed Pass-Bands Optimizer improves performance on various tasks.
1. The theoretical derivation in this article is inaccurate: (1) From Eqs. (2–5), the authors try to link the time domain convolution in Eq. (2) to a band pass filter; However, $\omega_1<\omega_2$ is not ensured, and Eqs. (3-5) are disconnected from Eq. (2). The claimed band-pass property is ungrounded. (2) Eqs (7-9) are largely overlap with [1], merely rewriting the same expressions from the Z domain into the frequency. In line 172, the authors state that $|H_{LIF}(e^{j0})|^2 = 1$; however, t
1. Motivation is clear and backed by analysis. The LIF low‑pass characterization is explicit (Eq. 9 with Fig. 1). 2. Small, deployable change. The pre‑filter is two‑tap, streaming‑friendly, and sits outside the backbone. The claim of “two learnable parameters” is attractive for deployment. 3. Experiments show consistent empirical gains across tasks/backbones without architectural edits.
1. The paper repeatedly emphasizes only two learnable scalars ($\mu,\omega$), yet Table 4 ablates amplitude $A$, which materially changes accuracy. Clarify whether $A$ (and $\phi$) are fixed or tuned per dataset—this affects both training efficiency and fair comparison claims. If $A$ is tuned per dataset, the comparison is not purely “two‑scalar” anymore; please make this precise. 2. A fair question is whether *simpler* band‑pass designs (e.g., a time-invariant learnable) would match PBO. The pa
1. Clear problem statement - The paper’s primary strength is its clear, frequency-domain analysis of the "pass-band mismatch" in SNNs. Reframing the SNN video processing bottleneck from a signal-processing perspective is a good contribution that clearly explains why SNNs, despite their temporal nature, might fail at motion tasks. 2. Extensive experiments - The method achieves substantial, not marginal, accuracy gains such as +10.55% and +11.55% on UCF101. 3. Efficiency - PBO is a lightweight
1. Pass-Band claims - Figure 1(c) says that PBO creates 'task-optimal pass-band', but I cannot find other cases. 2. Limited Ablation studies - Loss combination & only applied to UCV101-CEP dataset. 3. Hyperparameter setting - Related to the second weakness. Are A=0.1 and $\alpha=1\times 10^-2$ optimal parameters for every dataset? 4. Theoretical inconsistency - In Appendix A, the authors said a single $\lambda$ cannot create mid-band peak. However, Appendix E shows $\lambda [t] \approx$ const
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Human Pose and Action Recognition · Ferroelectric and Negative Capacitance Devices
