TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Dual-Level Scale-Oriented Contrast
Beilei Cui, Yiming Huang, Long Bai, Hongliang Ren

TL;DR
TR2M introduces a novel framework that leverages language descriptions and dual-level contrastive learning to accurately transfer relative depth to metric depth across diverse domains, enhancing generalization and zero-shot performance.
Contribution
The paper proposes TR2M, a new method combining text and image inputs with scale-oriented contrastive learning to improve metric depth estimation from relative depth data.
Findings
TR2M achieves state-of-the-art results on multiple datasets.
The method demonstrates strong zero-shot generalization.
It effectively reduces scale uncertainty in depth estimation.
Abstract
This work presents a generalizable framework to transfer relative depth to metric depth. Current monocular depth estimation methods are mainly divided into metric depth estimation (MMDE) and relative depth estimation (MRDE). MMDEs estimate depth in metric scale but are often limited to a specific domain. MRDEs generalize well across different domains, but with uncertain scales which hinders downstream applications. To this end, we aim to build up a framework to solve scale uncertainty and transfer relative depth to metric depth. Previous methods used language as input and estimated two factors for conducting rescaling. Our approach, TR2M, utilizes both text description and image as inputs and estimates two rescale maps to transfer relative depth to metric depth at pixel level. Features from two modalities are fused with a cross-modality attention module to better capture scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques
MethodsContrastive Learning
