HDI-Former: Hybrid Dynamic Interaction ANN-SNN Transformer for Object Detection Using Frames and Events
Dianze Li, Jianing Li, Xu Liu, Zhaokun Zhou, Xiaopeng Fan, Yonghong, Tian

TL;DR
HDI-Former introduces a hybrid ANN-SNN transformer architecture that effectively combines frame and event data for high-accuracy, energy-efficient object detection, utilizing novel attention and interaction mechanisms.
Contribution
It is the first to design a directly trained hybrid ANN-SNN architecture for object detection using frames and events, enhancing cross-modality interaction and energy efficiency.
Findings
Outperforms eleven state-of-the-art methods significantly.
SNN branch achieves comparable accuracy to ANN with 10.57× less energy.
Effective modeling of temporal cues from event streams.
Abstract
Combining the complementary benefits of frames and events has been widely used for object detection in challenging scenarios. However, most object detection methods use two independent Artificial Neural Network (ANN) branches, limiting cross-modality information interaction across the two visual streams and encountering challenges in extracting temporal cues from event streams with low power consumption. To address these challenges, we propose HDI-Former, a Hybrid Dynamic Interaction ANN-SNN Transformer, marking the first trial to design a directly trained hybrid ANN-SNN architecture for high-accuracy and energy-efficient object detection using frames and events. Technically, we first present a novel semantic-enhanced self-attention mechanism that strengthens the correlation between image encoding tokens within the ANN Transformer branch for better performance. Then, we design a Spiking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies · Industrial Vision Systems and Defect Detection
MethodsAttention Is All You Need · Residual Connection · Softmax · Adam · Label Smoothing · Dropout · Dense Connections · Spiking Neural Networks · Linear Layer · Stochastic Depth
