Bridging Geometric and Semantic Foundation Models for Generalized Monocular Depth Estimation

Sanggyun Ma; Wonjoon Choi; Jihun Park; Jaeyeul Kim; Seunghun Lee; Jiwan Seo; Sunghoon Im

arXiv:2505.23400·cs.CV·February 27, 2026

Bridging Geometric and Semantic Foundation Models for Generalized Monocular Depth Estimation

Sanggyun Ma, Wonjoon Choi, Jihun Park, Jaeyeul Kim, Seunghun Lee, Jiwan Seo, Sunghoon Im

PDF

Open Access

TL;DR

BriGeS introduces a novel approach that combines geometric and semantic foundation models using a Bridging Gate and attention scaling to improve monocular depth estimation across diverse and complex scenes.

Contribution

The paper proposes BriGeS, a method that effectively fuses geometric and semantic models with minimal training, enhancing generalization in monocular depth estimation.

Findings

01

Outperforms state-of-the-art methods on multiple datasets

02

Efficient training by focusing only on the Bridging Gate

03

Effectively handles complex scenes with overlapping objects

Abstract

We present Bridging Geometric and Semantic (BriGeS), an effective method that fuses geometric and semantic information within foundation models to enhance Monocular Depth Estimation (MDE). Central to BriGeS is the Bridging Gate, which integrates the complementary strengths of depth and segmentation foundation models. This integration is further refined by our Attention Temperature Scaling technique. It finely adjusts the focus of the attention mechanisms to prevent over-concentration on specific features, thus ensuring balanced performance across diverse inputs. BriGeS capitalizes on pre-trained foundation models and adopts a strategy that focuses on training only the Bridging Gate. This method significantly reduces resource demands and training time while maintaining the model's ability to generalize effectively. Extensive experiments across multiple challenging datasets demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Robotics and Sensor-Based Localization

MethodsSoftmax · Attention Is All You Need · Focus