SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via   Swin Transformer and Densely Cascaded Network

Dongseok Shim; H. Jin Kim

arXiv:2301.06715·cs.CV·January 18, 2023

SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via Swin Transformer and Densely Cascaded Network

Dongseok Shim, H. Jin Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces SwinDepth, an unsupervised monocular depth estimation method that leverages a Swin Transformer for feature extraction and a densely cascaded network for multi-scale depth prediction, outperforming existing methods.

Contribution

It proposes a novel architecture combining Swin Transformer and densely cascaded connections for improved unsupervised depth estimation from monocular sequences.

Findings

01

Outperforms state-of-the-art unsupervised methods on KITTI and Make3D datasets.

02

Utilizes a convolution-free Swin Transformer for better feature representation.

03

Densely cascaded network enhances multi-scale depth prediction quality.

Abstract

Monocular depth estimation plays a critical role in various computer vision and robotics applications such as localization, mapping, and 3D object detection. Recently, learning-based algorithms achieve huge success in depth estimation by training models with a large amount of data in a supervised manner. However, it is challenging to acquire dense ground truth depth labels for supervised training, and the unsupervised depth estimation using monocular sequences emerges as a promising alternative. Unfortunately, most studies on unsupervised depth estimation explore loss functions or occlusion masks, and there is little change in model architecture in that ConvNet-based encoder-decoder structure becomes a de-facto standard for depth estimation. In this paper, we employ a convolution-free Swin Transformer as an image feature extractor so that the network can capture both local geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dsshim0125/SwinDepth
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Dropout · Adam · Stochastic Depth · Byte Pair Encoding · Residual Connection · Label Smoothing · Dense Connections