Depth2Pose: A Pose-Based Benchmark for Monocular Depth Estimation without Ground-Truth Depth

Viktor Kocur; Sithu Aung; Gabrielle Flood; Yaqing Ding; Lukas Bujnak; Torsten Sattler; Zuzana Kukelova

arXiv:2605.19797·cs.CV·May 20, 2026

Depth2Pose: A Pose-Based Benchmark for Monocular Depth Estimation without Ground-Truth Depth

Viktor Kocur, Sithu Aung, Gabrielle Flood, Yaqing Ding, Lukas Bujnak, Torsten Sattler, Zuzana Kukelova

PDF

1 Repo

TL;DR

Depth2Pose introduces a pose-based evaluation framework for monocular depth estimation that assesses depth quality based on downstream task performance, reducing reliance on dense ground-truth depth data.

Contribution

The paper presents a novel pose-based evaluation framework and dataset for monocular depth estimation, enabling assessment without dense ground-truth depth.

Findings

01

Methods that perform well on standard depth metrics may not generalize to challenging scenes.

02

The pose-based metric correlates with downstream task performance.

03

The framework is applicable to scenes where ground-truth depth is hard to obtain.

Abstract

Monocular depth estimation has improved significantly in recent years, driven by increasingly powerful models and large-scale training data. Predicted depth is increasingly used as an input signal for downstream tasks such as Structure-from-Motion (SfM), visual localization, and SLAM. However, monocular depth estimators (MDEs) are still primarily evaluated in terms of depth accuracy. Standard metrics aggregate errors globally and may not reflect the usefulness of depth for downstream geometric tasks. We therefore propose Depth2Pose, a framework for evaluating MDEs in the context of downstream tasks. By combining depth predictions with feature correspondences in depth-aware geometric solvers, we use relative camera pose estimation accuracy as a task-driven proxy for depth quality. Traditional benchmarks require dense ground truth in the form of per-pixel depth, which is expensive to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://kocurvik.github.io/depth2pose
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.