Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
Doyeon Kim, Woonghyun Ka, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, and, Junmo Kim

TL;DR
This paper introduces a novel hierarchical transformer-based network with a specialized decoder and improved augmentation techniques for monocular depth estimation, achieving state-of-the-art results on NYU Depth V2.
Contribution
The paper presents a new global-local path network with a lightweight decoder and enhanced training strategies for improved depth prediction accuracy.
Findings
Achieves state-of-the-art performance on NYU Depth V2 dataset.
Demonstrates better generalization and robustness than existing models.
Offers a computationally efficient decoder with superior detail recovery.
Abstract
Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks. In this paper, we propose a novel structure and training strategy for monocular depth estimation to further improve the prediction accuracy of the network. We deploy a hierarchical transformer encoder to capture and convey the global context, and design a lightweight yet powerful decoder to generate an estimated depth map while considering local connectivity. By constructing connected paths between multi-scale local features and the global decoding stream with our proposed selective feature fusion module, the network can integrate both representations and recover fine details. In addition, the proposed decoder shows better performance than the previously proposed decoders, with considerably less…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗vinvino02/glpn-kittimodel· 1.6k dl· ♡ 91.6k dl♡ 9
- 🤗vinvino02/glpn-nyumodel· 1.2k dl· ♡ 251.2k dl♡ 25
- 🤗a6047425318/room-3d-scene-estimationmodel· 6 dl· ♡ 36 dl♡ 3
- 🤗FoamoftheSea/pvt_v2_b0model· 15 dl15 dl
- 🤗OpenGVLab/pvt_v2_b0model· 3.6k dl· ♡ 33.6k dl♡ 3
- 🤗OpenGVLab/pvt_v2_b1model· 13 dl· ♡ 113 dl♡ 1
- 🤗OpenGVLab/pvt_v2_b2model· 108 dl· ♡ 1108 dl♡ 1
- 🤗OpenGVLab/pvt_v2_b2_linearmodel· 17 dl· ♡ 117 dl♡ 1
- 🤗OpenGVLab/pvt_v2_b3model· 300 dl· ♡ 2300 dl♡ 2
- 🤗OpenGVLab/pvt_v2_b4model· 12 dl· ♡ 112 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques
