Hybrid Spiking Vision Transformer for Object Detection with Event Cameras

Qi Xu; Jie Deng; Jiangrong Shen; Biwu Chen; Huajin Tang; Gang Pan

arXiv:2505.07715·cs.CV·May 13, 2025

Hybrid Spiking Vision Transformer for Object Detection with Event Cameras

Qi Xu, Jie Deng, Jiangrong Shen, Biwu Chen, Huajin Tang, Gang Pan

PDF

Open Access

TL;DR

This paper introduces a hybrid spiking vision Transformer model that effectively captures spatiotemporal features for event-based object detection, achieving improved accuracy with fewer parameters, and provides a new dataset for benchmarking.

Contribution

The study proposes the HsVT model combining spatial and temporal modules for enhanced event-based object detection and releases a new Fall Detection Dataset for benchmarking.

Findings

01

HsVT outperforms existing models in event detection accuracy.

02

The model achieves these results with fewer parameters.

03

The Fall Detection Dataset supports future research in this area.

Abstract

Event-based object detection has gained increasing attention due to its advantages such as high temporal resolution, wide dynamic range, and asynchronous address-event representation. Leveraging these advantages, Spiking Neural Networks (SNNs) have emerged as a promising approach, offering low energy consumption and rich spatiotemporal dynamics. To further enhance the performance of event-based object detection, this study proposes a novel hybrid spike vision Transformer (HsVT) model. The HsVT model integrates a spatial feature extraction module to capture local and global features, and a temporal feature extraction module to model time dependencies and long-term patterns in event sequences. This combination enables HsVT to capture spatiotemporal features, improving its capability to handle complex event-based object detection tasks. To support research in this area, we developed and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · EEG and Brain-Computer Interfaces

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax