Video Autoencoder: self-supervised disentanglement of static 3D structure and motion
Zihang Lai, Sifei Liu, Alexei A. Efros, Xiaolong Wang

TL;DR
This paper introduces a self-supervised video autoencoder that disentangles 3D scene structure and camera motion from videos, enabling tasks like novel view synthesis and pose estimation without ground truth annotations.
Contribution
It presents a novel self-supervised method for disentangling 3D structure and motion in videos using a deep autoencoder trained with pixel reconstruction loss.
Findings
Effective disentanglement of 3D structure and camera pose.
Successful application to view synthesis and pose estimation.
Good generalization to out-of-domain videos.
Abstract
A video autoencoder is proposed for learning disentan- gled representations of 3D structure and camera pose from videos in a self-supervised manner. Relying on temporal continuity in videos, our work assumes that the 3D scene structure in nearby video frames remains static. Given a sequence of video frames as input, the video autoencoder extracts a disentangled representation of the scene includ- ing: (i) a temporally-consistent deep voxel feature to represent the 3D structure and (ii) a 3D trajectory of camera pose for each frame. These two representations will then be re-entangled for rendering the input video frames. This video autoencoder can be trained directly using a pixel reconstruction loss, without any ground truth 3D or camera pose annotations. The disentangled representation can be applied to a range of tasks, including novel view synthesis, camera pose estimation, and video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis
