Motion Modes: What Could Happen Next?
Karran Pandey, Matheus Gadelha, Yannick Hold-Geoffroy, Karan Singh,, Niloy J. Mitra, Paul Guerrero

TL;DR
Motion Modes is a training-free method that leverages a pre-trained image-to-video generator's latent space to produce diverse, realistic object motions from static images, effectively disentangling object and camera movements without explicit training.
Contribution
It introduces a novel, training-free approach using energy-guided exploration of latent space to generate diverse object motions in static images, surpassing prior methods.
Findings
Generates realistic, diverse object animations.
Outperforms previous motion prediction methods.
Achieves results comparable to human predictions.
Abstract
Predicting diverse object motions from a single static image remains challenging, as current video generation models often entangle object movement with camera motion and other scene changes. While recent methods can predict specific motions from motion arrow input, they rely on synthetic data and predefined motions, limiting their application to complex scenes. We introduce Motion Modes, a training-free approach that explores a pre-trained image-to-video generator's latent distribution to discover various distinct and plausible motions focused on selected objects in static images. We achieve this by employing a flow generator guided by energy functions designed to disentangle object and camera motion. Additionally, we use an energy inspired by particle guidance to diversify the generated motions, without requiring explicit training data. Experimental results demonstrate that Motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDynamics and Control of Mechanical Systems · Geophysics and Sensor Technology
