ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth   Estimation

Suraj Patni; Aradhye Agarwal; Chetan Arora

arXiv:2403.18807·cs.CV·April 18, 2024·2 cites

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

Suraj Patni, Aradhye Agarwal, Chetan Arora

PDF

Open Access 1 Repo

TL;DR

ECoDepth introduces a novel depth estimation model conditioned on ViT embeddings, achieving state-of-the-art accuracy on standard datasets and strong zero-shot transfer performance by leveraging pre-trained image priors.

Contribution

The paper proposes a new SIDE model using a diffusion backbone conditioned on ViT embeddings, surpassing previous methods in accuracy and transferability.

Findings

01

Achieves SOTA on NYUv2 with 0.059 Abs Rel error.

02

Improves KITTI Sq Rel error to 0.139.

03

Demonstrates strong zero-shot transfer across multiple datasets.

Abstract

In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot transfer in several applications. Taking inspiration from this, in our paper we explore the use of global image priors generated from a pre-trained ViT model to provide more detailed contextual information. We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings. Based on this idea, we propose a new SIDE model using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aradhye2002/ecodepth
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Optical measurement and interference techniques · 3D Shape Modeling and Analysis

MethodsSoftmax · Linear Layer · Layer Normalization · Residual Connection · Attention Is All You Need · Dense Connections · Multi-Head Attention · Vision Transformer · Diffusion