Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes

Dan Xu; Andrea Vedaldi; Joao F. Henriques

arXiv:2105.02195·cs.CV·June 2, 2021·1 cites

Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes

Dan Xu, Andrea Vedaldi, Joao F. Henriques

PDF

Open Access

TL;DR

This paper introduces an unsupervised deep learning approach for decomposing videos into 3D geometry, moving objects, and their motions, even in non-rigid scenes, using a novel small-region view synthesis technique.

Contribution

It extends view synthesis-based training to non-rigid scenes by modeling local rigid regions, enabling automatic learning of depth, odometry, and object motions without supervision.

Findings

01

Achieves competitive unsupervised depth and odometry on KITTI.

02

Recovers object motions and segmentations on EPIC-Kitchens without ground truth.

03

Handles non-rigid scenes by modeling local rigid regions.

Abstract

We propose a method to train deep networks to decompose videos into 3D geometry (camera and depth), moving objects, and their motions, with no supervision. We build on the idea of view synthesis, which uses classical camera geometry to re-render a source image from a different point-of-view, specified by a predicted relative pose and depth map. By minimizing the error between the synthetic image and the corresponding real image in a video, the deep network that predicts pose and depth can be trained completely unsupervised. However, the view synthesis equations rely on a strong assumption: that objects do not move. This rigid-world assumption limits the predictive power, and rules out learning about objects automatically. We propose a simple solution: minimize the error on small regions of the image instead. While the scene as a whole may be non-rigid, it is always possible to find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robotics and Sensor-Based Localization