Self-Supervised Monocular Scene Decomposition and Depth Estimation

Sadra Safadoust; Fatma G\"uney

arXiv:2110.11275·cs.CV·October 22, 2021

Self-Supervised Monocular Scene Decomposition and Depth Estimation

Sadra Safadoust, Fatma G\"uney

PDF

TL;DR

This paper introduces MonoDepthSeg, a self-supervised method that jointly estimates depth and segments moving objects in monocular videos without ground-truth labels, improving scene understanding.

Contribution

It proposes a novel scene decomposition approach that models independently moving objects with separate transformations, enhancing depth estimation accuracy.

Findings

01

Improves depth estimation accuracy on driving datasets.

02

Effectively segments moving objects without supervision.

03

Demonstrates efficient joint estimation with shared encoder.

Abstract

Self-supervised monocular depth estimation approaches either ignore independently moving objects in the scene or need a separate segmentation step to identify them. We propose MonoDepthSeg to jointly estimate depth and segment moving objects from monocular video without using any ground-truth labels. We decompose the scene into a fixed number of components where each component corresponds to a region on the image with its own transformation matrix representing its motion. We estimate both the mask and the motion of each component efficiently with a shared encoder. We evaluate our method on three driving datasets and show that our model clearly improves depth estimation while decomposing the scene into separately moving components.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.