Transformers in Self-Supervised Monocular Depth Estimation with Unknown   Camera Intrinsics

Arnav Varma; Hemang Chawla; Bahram Zonooz; Elahe Arani

arXiv:2202.03131·cs.CV·February 3, 2023

Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics

Arnav Varma, Hemang Chawla, Bahram Zonooz, Elahe Arani

PDF

Open Access 1 Repo

TL;DR

This paper explores the adaptation of vision transformers for self-supervised monocular depth estimation, demonstrating comparable performance to CNNs with enhanced robustness and generalization, especially when camera intrinsics are unknown.

Contribution

It introduces a method to adapt vision transformers for depth estimation and compares their performance and robustness against CNNs on benchmark datasets.

Findings

01

Transformers achieve comparable depth estimation accuracy to CNNs.

02

Transformers exhibit greater robustness to corruptions and adversarial attacks.

03

Performance remains strong even when camera intrinsics are unknown.

Abstract

The advent of autonomous driving and advanced driver assistance systems necessitates continuous developments in computer vision for 3D scene understanding. Self-supervised monocular depth estimation, a method for pixel-wise distance estimation of objects from a single camera without the use of ground truth labels, is an important task in 3D scene understanding. However, existing methods for this task are limited to convolutional neural network (CNN) architectures. In contrast with CNNs that use localized linear operations and lose feature resolution across the layers, vision transformers process at constant resolution with a global receptive field at every stage. While recent works have compared transformers against their CNN counterparts for tasks such as image classification, no study exists that investigates the impact of using transformers for self-supervised monocular depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neurai-lab/mt-sfmlearner
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging · Domain Adaptation and Few-Shot Learning