TL;DR
CubeSLAM introduces a monocular 3D object SLAM system that jointly optimizes camera and object poses, improving accuracy and robustness in static and dynamic environments through multi-view bundle adjustment and object constraints.
Contribution
The paper presents a novel integrated approach combining 3D object detection and SLAM that enhances camera pose estimation and object localization in monocular setups.
Findings
Outperforms existing methods in 3D detection accuracy on SUN RGBD and KITTI.
Achieves state-of-the-art monocular camera pose estimation on TUM and KITTI datasets.
Improves robustness and reduces drift in dynamic environments.
Abstract
We present a method for single image 3D cuboid object detection and multi-view object SLAM in both static and dynamic environments, and demonstrate that the two parts can improve each other. Firstly for single image object detection, we generate high-quality cuboid proposals from 2D bounding boxes and vanishing points sampling. The proposals are further scored and selected based on the alignment with image edges. Secondly, multi-view bundle adjustment with new object measurements is proposed to jointly optimize poses of cameras, objects and points. Objects can provide long-range geometric and scale constraints to improve camera pose estimation and reduce monocular drift. Instead of treating dynamic regions as outliers, we utilize object representation and motion model constraints to improve the camera pose estimation. The 3D detection experiments on SUN RGBD and KITTI show better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
