Self-supervised 3D Object Detection from Monocular Pseudo-LiDAR
Curie Kim, Ue-Hwan Kim, Jong-Hwan Kim

TL;DR
This paper introduces a novel end-to-end method for 3D object detection and absolute depth prediction using only monocular image sequences, outperforming existing approaches on the KITTI dataset.
Contribution
It presents a new approach that enables absolute depth prediction and 3D detection from monocular images through end-to-end learning, improving accuracy without relying on stereo or LiDAR data.
Findings
Outperforms existing methods on KITTI 3D dataset
Achieves absolute depth prediction from monocular images
Enhances depth prediction accuracy through end-to-end learning
Abstract
There have been attempts to detect 3D objects by fusion of stereo camera images and LiDAR sensor data or using LiDAR for pre-training and only monocular images for testing, but there have been less attempts to use only monocular image sequences due to low accuracy. In addition, when depth prediction using only monocular images, only scale-inconsistent depth can be predicted, which is the reason why researchers are reluctant to use monocular images alone. Therefore, we propose a method for predicting absolute depth and detecting 3D objects using only monocular image sequences by enabling end-to-end learning of detection networks and depth prediction networks. As a result, the proposed method surpasses other existing methods in performance on the KITTI 3D dataset. Even when monocular image and 3D LiDAR are used together during training in an attempt to improve performance, ours exhibit is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Image Processing Techniques and Applications · Advanced Neural Network Applications
