SVDM: Single-View Diffusion Model for Pseudo-Stereo 3D Object Detection
Yuguang Shi

TL;DR
This paper introduces SVDM, an end-to-end pseudo-stereo 3D detection framework using a diffusion model to improve accuracy and speed, achieving state-of-the-art results on KITTI.
Contribution
The paper presents a novel Single-View Diffusion Model (SVDM) that enables end-to-end training and compatibility with various stereo detectors for pseudo-stereo 3D detection.
Findings
Achieves state-of-the-art performance on KITTI dataset
Allows end-to-end training of pseudo-stereo 3D detection pipeline
Compatible with most stereo detectors
Abstract
One of the key problems in 3D object detection is to reduce the accuracy gap between methods based on LiDAR sensors and those based on monocular cameras. A recently proposed framework for monocular 3D detection based on Pseudo-Stereo has received considerable attention in the community. However, so far these two problems are discovered in existing practices, including (1) monocular depth estimation and Pseudo-Stereo detector must be trained separately, (2) Difficult to be compatible with different stereo detectors and (3) the overall calculation is large, which affects the reasoning speed. In this work, we propose an end-to-end, efficient pseudo-stereo 3D detection framework by introducing a Single-View Diffusion Model (SVDM) that uses a few iterations to gradually deliver right informative pixels to the left image. SVDM allows the entire pseudo-stereo 3D detection pipeline to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
