Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes
Dan Xu, Andrea Vedaldi, Joao F. Henriques

TL;DR
This paper introduces an unsupervised deep learning approach for decomposing videos into 3D geometry, moving objects, and their motions, even in non-rigid scenes, using a novel small-region view synthesis technique.
Contribution
It extends view synthesis-based training to non-rigid scenes by modeling local rigid regions, enabling automatic learning of depth, odometry, and object motions without supervision.
Findings
Achieves competitive unsupervised depth and odometry on KITTI.
Recovers object motions and segmentations on EPIC-Kitchens without ground truth.
Handles non-rigid scenes by modeling local rigid regions.
Abstract
We propose a method to train deep networks to decompose videos into 3D geometry (camera and depth), moving objects, and their motions, with no supervision. We build on the idea of view synthesis, which uses classical camera geometry to re-render a source image from a different point-of-view, specified by a predicted relative pose and depth map. By minimizing the error between the synthetic image and the corresponding real image in a video, the deep network that predicts pose and depth can be trained completely unsupervised. However, the view synthesis equations rely on a strong assumption: that objects do not move. This rigid-world assumption limits the predictive power, and rules out learning about objects automatically. We propose a simple solution: minimize the error on small regions of the image instead. While the scene as a whole may be non-rigid, it is always possible to find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
