SODFormer: Streaming Object Detection with Transformer Using Events and   Frames

Dianze Li; Jianing Li; Yonghong Tian

arXiv:2308.04047·cs.CV·August 9, 2023

SODFormer: Streaming Object Detection with Transformer Using Events and Frames

Dianze Li, Jianing Li, Yonghong Tian

PDF

1 Repo

TL;DR

SODFormer is a novel Transformer-based streaming object detection framework that fuses asynchronous event and frame data, leveraging rich temporal cues to improve detection in challenging conditions like fast motion and low light.

Contribution

This work introduces a new multimodal neuromorphic dataset and a Transformer architecture for asynchronous object detection, effectively fusing event and frame streams in real-time.

Findings

01

Outperforms four state-of-the-art methods and eight baselines.

02

Effective in high-speed motion and low-light scenarios.

03

Demonstrates the advantage of asynchronous fusion over synchronized methods.

Abstract

DAVIS camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges (e.g., fast motion blur and low-light). However, how to effectively leverage rich temporal cues and fuse two heterogeneous visual streams remains a challenging endeavor. To address this challenge, we propose a novel streaming object detector with Transformer, namely SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1k manual labels. Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dianzl/sodformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding