USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion with Semantic Guidance and Coupled Networks
Johan Vertens, Wolfram Burgard

TL;DR
USegScene introduces a semantically guided, coupled neural network framework for unsupervised learning of depth, optical flow, and ego-motion, significantly improving accuracy on KITTI dataset by leveraging semantic information and joint predictions.
Contribution
The paper presents a novel architecture that jointly predicts depth, optical flow, and ego-motion with semantic guidance, enabling cross-task information sharing and explicit occlusion learning.
Findings
Outperforms existing methods on KITTI dataset
Joint prediction improves accuracy of depth and flow estimation
Semantic guidance enhances regularization and occlusion handling
Abstract
In this paper we propose USegScene, a framework for semantically guided unsupervised learning of depth, optical flow and ego-motion estimation for stereo camera images using convolutional neural networks. Our framework leverages semantic information for improved regularization of depth and optical flow maps, multimodal fusion and occlusion filling considering dynamic rigid object motions as independent SE(3) transformations. Furthermore, complementary to pure photo-metric matching, we propose matching of semantic features, pixel-wise classes and object instance borders between the consecutive images. In contrast to previous methods, we propose a network architecture that jointly predicts all outputs using shared encoders and allows passing information across the task-domains, e.g., the prediction of optical flow can benefit from the prediction of the depth. Furthermore, we explicitly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Advanced Image Processing Techniques
