Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning
Karl Pertsch, Oleh Rybkin, Jingyun Yang, Shenghao Zhou, Konstantinos, G. Derpanis, Kostas Daniilidis, Joseph Lim, Andrew Jaegle

TL;DR
This paper introduces KeyIn, a hierarchical model that discovers keyframes in videos and uses them to predict and plan future frames, effectively capturing essential scene dynamics.
Contribution
The paper presents a novel differentiable hierarchical model that identifies keyframes and uses them for improved video prediction and planning.
Findings
KeyIn accurately discovers informative keyframes across diverse datasets.
KeyIn outperforms recent hierarchical models in predictive planning tasks.
The model effectively captures essential scene dynamics with fewer frames.
Abstract
Temporal observations such as videos contain essential information about the dynamics of the underlying scene, but they are often interleaved with inessential, predictable details. One way of dealing with this problem is by focusing on the most informative moments in a sequence. We propose a model that learns to discover these important events and the times when they occur and uses them to represent the full sequence. We do so using a hierarchical Keyframe-Inpainter (KeyIn) model that first generates a video's keyframes and then inpaints the rest by generating the frames at the intervening times. We propose a fully differentiable formulation to efficiently learn this procedure. We show that KeyIn finds informative keyframes in several datasets with different dynamics and visual properties. KeyIn outperforms other recent hierarchical predictive models for planning. For more details,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Video Analysis and Summarization · Multimodal Machine Learning Applications
