Self-supervised Learning with Geometric Constraints in Monocular Video:   Connecting Flow, Depth, and Camera

Yuhua Chen; Cordelia Schmid; Cristian Sminchisescu

arXiv:1907.05820·cs.CV·September 10, 2019·26 cites

Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera

Yuhua Chen, Cordelia Schmid, Cristian Sminchisescu

PDF

Open Access

TL;DR

GLNet is a self-supervised framework that jointly learns depth, optical flow, camera pose, and intrinsics from monocular video by leveraging geometric constraints and online refinement, outperforming previous methods on benchmark datasets.

Contribution

The paper introduces new geometric loss functions, extends the model to predict camera intrinsics, and proposes online refinement strategies for improved self-supervised learning.

Findings

01

Outperforms previous self-supervised methods on KITTI and Cityscapes.

02

Demonstrates good transfer learning capabilities on YouTube videos.

03

Effectively integrates geometric constraints for joint task learning.

Abstract

We present GLNet, a self-supervised framework for learning depth, optical flow, camera pose and intrinsic parameters from monocular video - addressing the difficulty of acquiring realistic ground-truth for such tasks. We propose three contributions: 1) we design new loss functions that capture multiple geometric constraints (eg. epipolar geometry) as well as an adaptive photometric loss that supports multiple moving objects, rigid and non-rigid, 2) we extend the model such that it predicts camera intrinsics, making it applicable to uncalibrated video, and 3) we propose several online refinement strategies that rely on the symmetry of our self-supervised loss in training and testing, in particular optimizing model parameters and/or the output of different tasks, thus leveraging their mutual interactions. The idea of jointly optimizing the system output, under all geometric and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Optical measurement and interference techniques