SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity

Shihao Zou; Qingfeng Li; Wei Ji; Jingjing Li; Yongkui Yang; Guoqi Li; Chao Dong

arXiv:2505.10352·cs.CV·May 16, 2025

SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity

Shihao Zou, Qingfeng Li, Wei Ji, Jingjing Li, Yongkui Yang, Guoqi Li, Chao Dong

PDF

Open Access 1 Repo

TL;DR

SpikeVideoFormer introduces a spike-driven video Transformer with linear temporal complexity and Hamming attention, achieving state-of-the-art performance and efficiency in various video vision tasks.

Contribution

It proposes a novel spike-driven attention mechanism and a linear complexity Transformer for video tasks, advancing SNN applications in video analysis.

Findings

01

Achieves over 15% improvement on pose tracking and segmentation tasks.

02

Matches recent ANN methods' performance with significant efficiency gains.

03

Maintains linear temporal complexity $ ext{O}(T)$ in video processing.

Abstract

Spiking Neural Networks (SNNs) have shown competitive performance to Artificial Neural Networks (ANNs) in various vision tasks, while offering superior energy efficiency. However, existing SNN-based Transformers primarily focus on single-image tasks, emphasizing spatial features while not effectively leveraging SNNs' efficiency in video-based vision tasks. In this paper, we introduce SpikeVideoFormer, an efficient spike-driven video Transformer, featuring linear temporal complexity $O (T)$ . Specifically, we design a spike-driven Hamming attention (SDHA) which provides a theoretically guided adaptation from traditional real-valued attention to spike-driven attention. Building on SDHA, we further analyze various spike-driven space-time attention designs and identify an optimal scheme that delivers appealing performance for video tasks, while maintaining only linear temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jimmyzou/spikevideoformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Image Enhancement Techniques · CCD and CMOS Imaging Sensors

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Layer Normalization · Focus · Byte Pair Encoding · Softmax · Absolute Position Encodings