Towards Comprehensive Monocular Depth Estimation: Multiple Heads Are Better Than One
Shuwei Shao, Ran Li, Zhongcai Pei, Zhong Liu, Weihai Chen, Wentao Zhu,, Xingming Wu, Baochang Zhang

TL;DR
This paper introduces TEDepth, a novel depth estimation approach that combines Transformer and CNN-based predictors through adaptive fusion, achieving superior accuracy and generalizability on standard datasets.
Contribution
It proposes a multi-architecture ensemble framework for monocular depth estimation, integrating Transformer and CNN models with adaptive fusion for improved accuracy.
Findings
TEDepth outperforms previous state-of-the-art methods on NYU-Depth-v2 and KITTI datasets.
The ensemble approach demonstrates strong generalization to the SUN RGB-D dataset without fine-tuning.
Combining Transformer and CNN models provides complementary depth estimates, enhancing overall performance.
Abstract
Depth estimation attracts widespread attention in the computer vision community. However, it is still quite difficult to recover an accurate depth map using only one RGB image. We observe a phenomenon that existing methods tend to fail in different cases, caused by differences in network architecture, loss function and so on. In this work, we investigate into the phenomenon and propose to integrate the strengths of multiple weak depth predictor to build a comprehensive and accurate depth predictor, which is critical for many real-world applications, e.g., 3D reconstruction. Specifically, we construct multiple base (weak) depth predictors by utilizing different Transformer-based and convolutional neural network (CNN)-based architectures. Transformer establishes long-range correlation while CNN preserves local information ignored by Transformer due to the spatial inductive bias.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques
