BSViT: A Burst Spiking Vision Transformer for Expressive and Efficient Visual Representation Learning

Hongxiang Peng; Dewei Bai; Hong Qu

arXiv:2604.23165·cs.CV·April 28, 2026

BSViT: A Burst Spiking Vision Transformer for Expressive and Efficient Visual Representation Learning

Hongxiang Peng, Dewei Bai, Hong Qu

PDF

TL;DR

BSViT introduces a novel burst spiking vision transformer with dual-channel self-attention and local masking, significantly improving accuracy and efficiency in visual learning tasks.

Contribution

It proposes a dual-channel burst spiking self-attention mechanism and patch adjacency masking to enhance representational capacity and reduce computation in spiking vision transformers.

Findings

01

BSViT outperforms existing spiking Transformers on static and event-based benchmarks.

02

The model maintains energy efficiency through addition-only operations.

03

Incorporating burst spike coding increases spike-level representational capacity.

Abstract

Spiking Vision Transformers (S-ViTs) offer a promising framework for energy-efficient visual learning. However, existing designs remain limited by two fundamental issues: the restricted information capacity of binary spike coding and the dense token interactions introduced by global self-attention. To address these challenges, this work proposes BSViT, a burst spiking-driven Vision Transformer featuring a Dual-Channel Burst Spiking Self-Attention (DBSSA) mechanism. DBSSA encodes queries with binary spikes and keys with burst spikes to enhance representational capacity. The value pathway adopts dual excitatory and inhibitory binary channels, enabling signed modulation and richer spike interactions. Importantly, the entire attention operation preserves addition-only computation, ensuring compatibility with energy-efficient neuromorphic hardware. To further reduce spike activity and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.