MambaDETR: Query-based Temporal Modeling using State Space Model for   Multi-View 3D Object Detection

Tong Ning; Ke Lu; Xirui Jiang; Jian Xue

arXiv:2411.13628·cs.CV·November 22, 2024

MambaDETR: Query-based Temporal Modeling using State Space Model for Multi-View 3D Object Detection

Tong Ning, Ke Lu, Xirui Jiang, Jian Xue

PDF

Open Access

TL;DR

MambaDETR introduces an efficient state space approach for temporal fusion in 3D object detection, outperforming existing methods by reducing computational costs and enhancing detection accuracy in autonomous driving scenarios.

Contribution

The paper presents MambaDETR, a novel state space model for temporal fusion that addresses computational inefficiency and information decay in transformer-based methods.

Findings

01

Achieves state-of-the-art performance on nuScenes benchmark

02

Reduces computational cost compared to traditional transformer methods

03

Effectively removes static objects to improve dynamic object detection

Abstract

Utilizing temporal information to improve the performance of 3D detection has made great progress recently in the field of autonomous driving. Traditional transformer-based temporal fusion methods suffer from quadratic computational cost and information decay as the length of the frame sequence increases. In this paper, we propose a novel method called MambaDETR, whose main idea is to implement temporal fusion in the efficient state space. Moreover, we design a Motion Elimination module to remove the relatively static objects for temporal fusion. On the standard nuScenes benchmark, our proposed MambaDETR achieves remarkable result in the 3D object detection task, exhibiting state-of-the-art performance among existing temporal fusion methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Processing and 3D Reconstruction · Robotics and Sensor-Based Localization