Enhanced Encoder-Decoder Architecture for Accurate Monocular Depth Estimation
Dabbrata Das, Argho Deb Das, Farhan Sadaf

TL;DR
This paper presents a novel deep learning architecture using Inception-ResNet-v2 for monocular depth estimation, achieving state-of-the-art accuracy and efficiency on benchmark datasets.
Contribution
First to utilize Inception-ResNet-v2 as an encoder for monocular depth estimation, introducing multi-scale features and a composite loss for improved accuracy.
Findings
Achieves state-of-the-art results on NYU Depth V2 dataset.
Faster inference time of 0.019 seconds on KITTI dataset.
Outperforms vision transformers in efficiency while maintaining accuracy.
Abstract
Estimating depth from a single 2D image is a challenging task due to the lack of stereo or multi-view data, which are typically required for depth perception. In state-of-the-art architectures, the main challenge is to efficiently capture complex objects and fine-grained details, which are often difficult to predict. This paper introduces a novel deep learning-based approach using an enhanced encoder-decoder architecture, where the Inception-ResNet-v2 model serves as the encoder. This is the first instance of utilizing Inception-ResNet-v2 as an encoder for monocular depth estimation, demonstrating improved performance over previous models. It incorporates multi-scale feature extraction to enhance depth prediction accuracy across various object sizes and distances. We propose a composite loss function comprising depth loss, gradient edge loss, and Structural Similarity Index Measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Convolution · Softmax · Inception-ResNet-v2 Reduction-B · Max Pooling · Inception-ResNet-v2-C · Reduction-A · Residual Connection · Dropout
