USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion   with Semantic Guidance and Coupled Networks

Johan Vertens; Wolfram Burgard

arXiv:2207.07469·cs.CV·July 18, 2022·1 cites

USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion with Semantic Guidance and Coupled Networks

Johan Vertens, Wolfram Burgard

PDF

Open Access

TL;DR

USegScene introduces a semantically guided, coupled neural network framework for unsupervised learning of depth, optical flow, and ego-motion, significantly improving accuracy on KITTI dataset by leveraging semantic information and joint predictions.

Contribution

The paper presents a novel architecture that jointly predicts depth, optical flow, and ego-motion with semantic guidance, enabling cross-task information sharing and explicit occlusion learning.

Findings

01

Outperforms existing methods on KITTI dataset

02

Joint prediction improves accuracy of depth and flow estimation

03

Semantic guidance enhances regularization and occlusion handling

Abstract

In this paper we propose USegScene, a framework for semantically guided unsupervised learning of depth, optical flow and ego-motion estimation for stereo camera images using convolutional neural networks. Our framework leverages semantic information for improved regularization of depth and optical flow maps, multimodal fusion and occlusion filling considering dynamic rigid object motions as independent SE(3) transformations. Furthermore, complementary to pure photo-metric matching, we propose matching of semantic features, pixel-wise classes and object instance borders between the consecutive images. In contrast to previous methods, we propose a network architecture that jointly predicts all outputs using shared encoders and allows passing information across the task-domains, e.g., the prediction of optical flow can benefit from the prediction of the depth. Furthermore, we explicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Advanced Image Processing Techniques