Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Yi, Zhang, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai,, Hongsheng Li

TL;DR
Motion-I2V introduces a two-stage framework for image-to-video generation that explicitly models motion, enabling more consistent and controllable videos with large motions and viewpoint changes.
Contribution
The paper proposes a novel explicit motion modeling framework with a diffusion-based predictor and motion-augmented attention, improving control and consistency in image-to-video generation.
Findings
Outperforms prior methods in video consistency and controllability.
Supports precise motion control via sparse trajectory annotations.
Enables zero-shot video-to-video translation.
Abstract
We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling. For the first stage, we propose a diffusion-based motion field predictor, which focuses on deducing the trajectories of the reference image's pixels. For the second stage, we propose motion-augmented temporal attention to enhance the limited 1-D temporal attention in video latent diffusion models. This module can effectively propagate reference image's feature to synthesized frames with the guidance of predicted trajectories from the first stage. Compared with existing methods, Motion-I2V can generate more consistent videos even at the presence of large motion and viewpoint variation. By training a sparse trajectory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Computer Graphics and Visualization Techniques
MethodsDiffusion
