Mojito: Motion Trajectory and Intensity Control for Video Generation
Xuehai He, Shuohang Wang, Jianwei Yang, Xiaoxia Wu, Yiping Wang, Kuan, Wang, Zheng Zhan, Olatunji Ruwase, Yelong Shen, Xin Eric Wang

TL;DR
Mojito is a novel diffusion model for text-to-video generation that enables precise control over motion trajectories and intensity, using efficient modules that leverage cross-attention and optical flow for realistic dynamic videos.
Contribution
Introduces Mojito, a diffusion-based framework with novel modules for directional motion control and intensity modulation, enhancing controllability and efficiency in text-to-video synthesis.
Findings
Achieves accurate motion trajectory control matching specified directions.
Effectively modulates motion intensity guided by optical flow.
Demonstrates high computational efficiency and realistic motion dynamics.
Abstract
Recent advancements in diffusion models have shown great promise in producing high-quality video content. However, efficiently training video diffusion models capable of integrating directional guidance and controllable motion intensity remains a challenging and under-explored area. To tackle these challenges, this paper introduces Mojito, a diffusion model that incorporates both motion trajectory and intensity control for text-to-video generation. Specifically, Mojito features a Directional Motion Control (DMC) module that leverages cross-attention to efficiently direct the generated object's motion without training, alongside a Motion Intensity Modulator (MIM) that uses optical flow maps generated from videos to guide varying levels of motion intensity. Extensive experiments demonstrate Mojito's effectiveness in achieving precise trajectory and intensity control with high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Advanced Vision and Imaging · Human Motion and Animation
MethodsALIGN · Diffusion
