Monocular Depth Estimation with Global-Aware Discretization and Local Context Modeling

Heng Wu; Qian Zhang; Guixu Zhang

arXiv:2508.03186·cs.CV·August 6, 2025

Monocular Depth Estimation with Global-Aware Discretization and Local Context Modeling

Heng Wu, Qian Zhang, Guixu Zhang

PDF

TL;DR

This paper introduces a novel monocular depth estimation approach that combines local and global cues using specialized modules, significantly improving accuracy and outperforming existing methods on standard datasets.

Contribution

The paper proposes the Gated Large Kernel Attention Module and Global Bin Prediction Module to enhance local and global feature extraction for monocular depth estimation.

Findings

01

Achieves state-of-the-art performance on NYU-V2 and KITTI datasets.

02

Effectively captures multi-scale local structural information.

03

Provides structural guidance through global depth distribution estimation.

Abstract

Accurate monocular depth estimation remains a challenging problem due to the inherent ambiguity that stems from the ill-posed nature of recovering 3D structure from a single view, where multiple plausible depth configurations can produce identical 2D projections. In this paper, we present a novel depth estimation method that combines both local and global cues to improve prediction accuracy. Specifically, we propose the Gated Large Kernel Attention Module (GLKAM) to effectively capture multi-scale local structural information by leveraging large kernel convolutions with a gated mechanism. To further enhance the global perception of the network, we introduce the Global Bin Prediction Module (GBPM), which estimates the global distribution of depth bins and provides structural guidance for depth regression. Extensive experiments on the NYU-V2 and KITTI dataset demonstrate that our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.