SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and   Semantic Segmentation from Monocular Camera Images

Pardis Taghavi; Reza Langari; Gaurav Pandey

arXiv:2403.10662·cs.CV·March 19, 2024·2 cites

SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images

Pardis Taghavi, Reza Langari, Gaurav Pandey

PDF

Open Access 1 Repo

TL;DR

This paper introduces SwinMTL, a multi-task learning framework with a shared architecture that simultaneously performs depth estimation and semantic segmentation from monocular images, achieving state-of-the-art results efficiently.

Contribution

It presents a novel shared encoder-decoder architecture with adversarial training for improved multi-task performance on monocular images.

Findings

01

Outperforms existing methods on Cityscapes and NYU Depth V2 datasets

02

Effective integration of adversarial training improves prediction accuracy

03

Ablation studies highlight the impact of different components

Abstract

This research paper presents an innovative multi-task learning framework that allows concurrent depth estimation and semantic segmentation using a single camera. The proposed approach is based on a shared encoder-decoder architecture, which integrates various techniques to improve the accuracy of the depth estimation and semantic segmentation task without compromising computational efficiency. Additionally, the paper incorporates an adversarial training component, employing a Wasserstein GAN framework with a critic network, to refine model's predictions. The framework is thoroughly evaluated on two datasets - the outdoor Cityscapes dataset and the indoor NYU Depth V2 dataset - and it outperforms existing state-of-the-art methods in both segmentation and depth estimation tasks. We also conducted ablation studies to analyze the contributions of different components, including pre-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pardistaghavi/swinmtl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Industrial Vision Systems and Defect Detection · Image and Object Detection Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections · Softmax