TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Dual-Level Scale-Oriented Contrast

Beilei Cui; Yiming Huang; Long Bai; Hongliang Ren

arXiv:2506.13387·cs.CV·March 20, 2026

TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Dual-Level Scale-Oriented Contrast

Beilei Cui, Yiming Huang, Long Bai, Hongliang Ren

PDF

Open Access 1 Repo

TL;DR

TR2M introduces a novel framework that leverages language descriptions and dual-level contrastive learning to accurately transfer relative depth to metric depth across diverse domains, enhancing generalization and zero-shot performance.

Contribution

The paper proposes TR2M, a new method combining text and image inputs with scale-oriented contrastive learning to improve metric depth estimation from relative depth data.

Findings

01

TR2M achieves state-of-the-art results on multiple datasets.

02

The method demonstrates strong zero-shot generalization.

03

It effectively reduces scale uncertainty in depth estimation.

Abstract

This work presents a generalizable framework to transfer relative depth to metric depth. Current monocular depth estimation methods are mainly divided into metric depth estimation (MMDE) and relative depth estimation (MRDE). MMDEs estimate depth in metric scale but are often limited to a specific domain. MRDEs generalize well across different domains, but with uncertain scales which hinders downstream applications. To this end, we aim to build up a framework to solve scale uncertainty and transfer relative depth to metric depth. Previous methods used language as input and estimated two factors for conducting rescaling. Our approach, TR2M, utilizes both text description and image as inputs and estimates two rescale maps to transfer relative depth to metric depth at pixel level. Features from two modalities are fused with a cross-modality attention module to better capture scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

beileicui/tr2m
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques

MethodsContrastive Learning