TL;DR
Depth2Pose introduces a pose-based evaluation framework for monocular depth estimation that assesses depth quality based on downstream task performance, reducing reliance on dense ground-truth depth data.
Contribution
The paper presents a novel pose-based evaluation framework and dataset for monocular depth estimation, enabling assessment without dense ground-truth depth.
Findings
Methods that perform well on standard depth metrics may not generalize to challenging scenes.
The pose-based metric correlates with downstream task performance.
The framework is applicable to scenes where ground-truth depth is hard to obtain.
Abstract
Monocular depth estimation has improved significantly in recent years, driven by increasingly powerful models and large-scale training data. Predicted depth is increasingly used as an input signal for downstream tasks such as Structure-from-Motion (SfM), visual localization, and SLAM. However, monocular depth estimators (MDEs) are still primarily evaluated in terms of depth accuracy. Standard metrics aggregate errors globally and may not reflect the usefulness of depth for downstream geometric tasks. We therefore propose Depth2Pose, a framework for evaluating MDEs in the context of downstream tasks. By combining depth predictions with feature correspondences in depth-aware geometric solvers, we use relative camera pose estimation accuracy as a task-driven proxy for depth quality. Traditional benchmarks require dense ground truth in the form of per-pixel depth, which is expensive to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
