MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Shuwei Shi, Biao Gong, Xi Chen, Dandan Zheng, Shuai Tan, Zizheng Yang,, Yuyuan Li, Jingwen He, Kecheng Zheng, Jingdong Chen, Ming Yang, Yinqiang, Zheng

TL;DR
MotionStone introduces a novel decoupled motion estimator for image-to-video generation, enabling more accurate control of local and global motions, and achieves state-of-the-art results with scalable, contrastive learning-based training.
Contribution
The paper proposes a new decoupled motion estimator using contrastive learning, improving motion measurement for I2V generation and enhancing model stability and performance.
Findings
The motion estimator accurately measures object and camera motion intensities.
MotionStone achieves state-of-the-art results on image-to-video generation tasks.
The decoupled estimator is scalable and serves as a versatile plug-in for various video processing applications.
Abstract
The image-to-video (I2V) generation is conditioned on the static image, which has been enhanced recently by the motion intensity as an additional control signal. These motion-aware models are appealing to generate diverse motion patterns, yet there lacks a reliable motion estimator for training such models on large-scale video set in the wild. Traditional metrics, e.g., SSIM or optical flow, are hard to generalize to arbitrary videos, while, it is very tough for human annotators to label the abstract motion intensity neither. Furthermore, the motion intensity shall reveal both local object motion and global camera movement, which has not been studied before. This paper addresses the challenge with a new motion estimator, capable of measuring the decoupled motion intensities of objects and cameras in video. We leverage the contrastive learning on randomly paired videos and distinguish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Optical Imaging Technologies · CCD and CMOS Imaging Sensors
MethodsSparse Evolutionary Training · Contrastive Learning
