Towards Comprehensive Monocular Depth Estimation: Multiple Heads Are   Better Than One

Shuwei Shao; Ran Li; Zhongcai Pei; Zhong Liu; Weihai Chen; Wentao Zhu,; Xingming Wu; Baochang Zhang

arXiv:2111.08313·cs.CV·September 26, 2023

Towards Comprehensive Monocular Depth Estimation: Multiple Heads Are Better Than One

Shuwei Shao, Ran Li, Zhongcai Pei, Zhong Liu, Weihai Chen, Wentao Zhu,, Xingming Wu, Baochang Zhang

PDF

Open Access

TL;DR

This paper introduces TEDepth, a novel depth estimation approach that combines Transformer and CNN-based predictors through adaptive fusion, achieving superior accuracy and generalizability on standard datasets.

Contribution

It proposes a multi-architecture ensemble framework for monocular depth estimation, integrating Transformer and CNN models with adaptive fusion for improved accuracy.

Findings

01

TEDepth outperforms previous state-of-the-art methods on NYU-Depth-v2 and KITTI datasets.

02

The ensemble approach demonstrates strong generalization to the SUN RGB-D dataset without fine-tuning.

03

Combining Transformer and CNN models provides complementary depth estimates, enhancing overall performance.

Abstract

Depth estimation attracts widespread attention in the computer vision community. However, it is still quite difficult to recover an accurate depth map using only one RGB image. We observe a phenomenon that existing methods tend to fail in different cases, caused by differences in network architecture, loss function and so on. In this work, we investigate into the phenomenon and propose to integrate the strengths of multiple weak depth predictor to build a comprehensive and accurate depth predictor, which is critical for many real-world applications, e.g., 3D reconstruction. Specifically, we construct multiple base (weak) depth predictors by utilizing different Transformer-based and convolutional neural network (CNN)-based architectures. Transformer establishes long-range correlation while CNN preserves local information ignored by Transformer due to the spatial inductive bias.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques