Video Interpolation and Prediction with Unsupervised Landmarks
Kevin J. Shih, Aysegul Dundar, Animesh Garg, Robert Pottorf, Andrew, Tao, Bryan Catanzaro

TL;DR
This paper introduces a method for long-range video prediction and interpolation by inferring unsupervised landmarks in a latent space, enabling interpretable and high-quality results without explicit supervision.
Contribution
It proposes a novel approach that uses unsupervised latent landmarks for video prediction, improving interpretability and long-range interpolation quality.
Findings
Landmark-based latent representations effectively capture semantic parts.
Interpolation within landmark coordinates yields predictable motion.
The method achieves high-quality long-range video interpolation and extrapolation.
Abstract
Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting. Optical flow based techniques generalize but are suitable only for short temporal ranges. Many methods opt to project the video frames to a low dimensional latent space, achieving long-range predictions. However, these latent representations are often non-interpretable, and therefore difficult to manipulate. This work poses video prediction and interpolation as unsupervised latent structure inference followed by a temporal prediction in this latent space. The latent representations capture foreground semantics without explicit supervision such as keypoints or poses. Further, as each landmark can be mapped to a coordinate indicating where a semantic part is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Enhancement Techniques
