3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth and Pose from Monocular Video
Guangming Wang, Jiquan Zhong, Shijie Zhao, Wenhua Wu, Zhe Liu, Hesheng, Wang

TL;DR
This paper introduces a novel unsupervised framework for depth and pose estimation from monocular video, utilizing 3D hierarchical refinement and augmentation to improve accuracy and robustness, achieving state-of-the-art results.
Contribution
The paper proposes a 3D hierarchical refinement and augmentation framework that explicitly leverages 3D geometry for unsupervised depth and pose learning from monocular videos.
Findings
Achieves state-of-the-art depth estimation on KITTI dataset.
Outperforms recent unsupervised visual odometry methods.
Competitive with geometry-based methods like ORB-SLAM2.
Abstract
Depth and ego-motion estimations are essential for the localization and navigation of autonomous robots and autonomous driving. Recent studies make it possible to learn the per-pixel depth and ego-motion from the unlabeled monocular video. A novel unsupervised training framework is proposed with 3D hierarchical refinement and augmentation using explicit 3D geometry. In this framework, the depth and pose estimations are hierarchically and mutually coupled to refine the estimated pose layer by layer. The intermediate view image is proposed and synthesized by warping the pixels in an image with the estimated depth and coarse pose. Then, the residual pose transformation can be estimated from the new view image and the image of the adjacent frame to refine the coarse pose. The iterative refinement is implemented in a differentiable manner in this paper, making the whole framework optimized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Human Pose and Action Recognition
MethodsORB-Simultaneous localization and mapping
