A Simple Baseline for Multi-Camera 3D Object Detection
Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Jie Zhou, Jiwen, Lu

TL;DR
SimMOD introduces a simple, effective two-stage multi-camera 3D object detection framework that leverages multi-view proposals and feature refinement, achieving state-of-the-art results on nuScenes.
Contribution
The paper presents a novel two-stage multi-camera 3D detection method with proposal refinement and auxiliary training strategies, improving over prior monocular and multi-view approaches.
Findings
Achieves new state-of-the-art performance on nuScenes.
Effectively integrates multi-view information for 3D detection.
Demonstrates the benefits of auxiliary branches and training strategies.
Abstract
3D object detection with surrounding cameras has been a promising direction for autonomous driving. In this paper, we present SimMOD, a Simple baseline for Multi-camera Object Detection, to solve the problem. To incorporate multi-view information as well as build upon previous efforts on monocular 3D object detection, the framework is built on sample-wise object proposals and designed to work in a two-stage manner. First, we extract multi-scale features and generate the perspective object proposals on each monocular image. Second, the multi-view proposals are aggregated and then iteratively refined with multi-view and multi-scale visual features in the DETR3D-style. The refined proposals are end-to-end decoded into the detection results. To further boost the performance, we incorporate the auxiliary branches alongside the proposal generation to enhance the feature learning. Also, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods
