BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment

Tongfan Guan; Jiaxin Guo; Chen Wang; Yun-Hui Liu

arXiv:2508.04611·cs.CV·August 14, 2025

BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment

Tongfan Guan, Jiaxin Guo, Chen Wang, Yun-Hui Liu

PDF

1 Models

TL;DR

BridgeDepth introduces a unified framework that aligns monocular and stereo depth reasoning through iterative latent synchronization, significantly improving generalization and handling challenging surfaces.

Contribution

It proposes a novel cross-attentive alignment mechanism that dynamically synchronizes monocular and stereo representations within a single network.

Findings

01

Reduces zero-shot generalization error by over 40% on Middlebury and ETH3D datasets.

02

Addresses failures on transparent and reflective surfaces.

03

Achieves state-of-the-art results in monocular-stereo depth estimation.

Abstract

Monocular and stereo depth estimation offer complementary strengths: monocular methods capture rich contextual priors but lack geometric precision, while stereo approaches leverage epipolar geometry yet struggle with ambiguities such as reflective or textureless surfaces. Despite post-hoc synergies, these paradigms remain largely disjoint in practice. We introduce a unified framework that bridges both through iterative bidirectional alignment of their latent representations. At its core, a novel cross-attentive alignment mechanism dynamically synchronizes monocular contextual cues with stereo hypothesis representations during stereo reasoning. This mutual alignment resolves stereo ambiguities (e.g., specular surfaces) by injecting monocular structure priors while refining monocular depth with stereo geometry within a single network. Extensive experiments demonstrate state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
aeolusguan/BridgeDepth
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.