SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning
Yi Feng, Zizhan Guo, Qijun Chen, Rui Fan

TL;DR
SCIPaD introduces a novel method that leverages spatial clues and a confidence-aware feature flow estimator to enhance unsupervised joint learning of depth and pose, significantly improving camera pose accuracy in complex scenarios.
Contribution
It proposes a new approach that incorporates spatial clues through a confidence-aware feature flow estimator and positional clue aggregator for robust pose-depth learning.
Findings
Achieves 22.2% reduction in translation error on KITTI dataset.
Achieves 34.8% reduction in angular error on KITTI dataset.
Outperforms state-of-the-art methods in unsupervised depth-pose estimation.
Abstract
Unsupervised monocular depth estimation frameworks have shown promising performance in autonomous driving. However, existing solutions primarily rely on a simple convolutional neural network for ego-motion recovery, which struggles to estimate precise camera poses in dynamic, complicated real-world scenarios. These inaccurately estimated camera poses can inevitably deteriorate the photometric reconstruction and mislead the depth estimation networks with wrong supervisory signals. In this article, we introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning. Specifically, a confidence-aware feature flow estimator is proposed to acquire 2D feature positional translations and their associated confidence levels. Meanwhile, we introduce a positional clue aggregator, which integrates pseudo 3D point clouds from DepthNet and 2D feature flows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Advanced Vision and Imaging
