Enhanced Encoder-Decoder Architecture for Accurate Monocular Depth   Estimation

Dabbrata Das; Argho Deb Das; Farhan Sadaf

arXiv:2410.11610·cs.CV·January 27, 2025

Enhanced Encoder-Decoder Architecture for Accurate Monocular Depth Estimation

Dabbrata Das, Argho Deb Das, Farhan Sadaf

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel deep learning architecture using Inception-ResNet-v2 for monocular depth estimation, achieving state-of-the-art accuracy and efficiency on benchmark datasets.

Contribution

First to utilize Inception-ResNet-v2 as an encoder for monocular depth estimation, introducing multi-scale features and a composite loss for improved accuracy.

Findings

01

Achieves state-of-the-art results on NYU Depth V2 dataset.

02

Faster inference time of 0.019 seconds on KITTI dataset.

03

Outperforms vision transformers in efficiency while maintaining accuracy.

Abstract

Estimating depth from a single 2D image is a challenging task due to the lack of stereo or multi-view data, which are typically required for depth perception. In state-of-the-art architectures, the main challenge is to efficiently capture complex objects and fine-grained details, which are often difficult to predict. This paper introduces a novel deep learning-based approach using an enhanced encoder-decoder architecture, where the Inception-ResNet-v2 model serves as the encoder. This is the first instance of utilizing Inception-ResNet-v2 as an encoder for monocular depth estimation, demonstrating improved performance over previous models. It incorporates multi-scale feature extraction to enhance depth prediction accuracy across various object sizes and distances. We propose a composite loss function comprising depth loss, gradient edge loss, and Structural Similarity Index Measure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dabbrata/depth-estimation-enc-dec
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Convolution · Softmax · Inception-ResNet-v2 Reduction-B · Max Pooling · Inception-ResNet-v2-C · Reduction-A · Residual Connection · Dropout