Mojito: Motion Trajectory and Intensity Control for Video Generation

Xuehai He; Shuohang Wang; Jianwei Yang; Xiaoxia Wu; Yiping Wang; Kuan; Wang; Zheng Zhan; Olatunji Ruwase; Yelong Shen; Xin Eric Wang

arXiv:2412.08948·cs.CV·February 6, 2025

Mojito: Motion Trajectory and Intensity Control for Video Generation

Xuehai He, Shuohang Wang, Jianwei Yang, Xiaoxia Wu, Yiping Wang, Kuan, Wang, Zheng Zhan, Olatunji Ruwase, Yelong Shen, Xin Eric Wang

PDF

Open Access

TL;DR

Mojito is a novel diffusion model for text-to-video generation that enables precise control over motion trajectories and intensity, using efficient modules that leverage cross-attention and optical flow for realistic dynamic videos.

Contribution

Introduces Mojito, a diffusion-based framework with novel modules for directional motion control and intensity modulation, enhancing controllability and efficiency in text-to-video synthesis.

Findings

01

Achieves accurate motion trajectory control matching specified directions.

02

Effectively modulates motion intensity guided by optical flow.

03

Demonstrates high computational efficiency and realistic motion dynamics.

Abstract

Recent advancements in diffusion models have shown great promise in producing high-quality video content. However, efficiently training video diffusion models capable of integrating directional guidance and controllable motion intensity remains a challenging and under-explored area. To tackle these challenges, this paper introduces Mojito, a diffusion model that incorporates both motion trajectory and intensity control for text-to-video generation. Specifically, Mojito features a Directional Motion Control (DMC) module that leverages cross-attention to efficiently direct the generated object's motion without training, alongside a Motion Intensity Modulator (MIM) that uses optical flow maps generated from videos to guide varying levels of motion intensity. Extensive experiments demonstrate Mojito's effectiveness in achieving precise trajectory and intensity control with high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology · Advanced Vision and Imaging · Human Motion and Animation

MethodsALIGN · Diffusion