MotionBooth: Motion-Aware Customized Text-to-Video Generation

Jianzong Wu; Xiangtai Li; Yanhong Zeng; Jiangning Zhang; Qianyu Zhou,; Yining Li; Yunhai Tong; Kai Chen

arXiv:2406.17758·cs.CV·October 30, 2024·1 cites

MotionBooth: Motion-Aware Customized Text-to-Video Generation

Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou,, Yining Li, Yunhai Tong, Kai Chen

PDF

Open Access 1 Models 1 Datasets 1 Video

TL;DR

MotionBooth is a novel framework that enables precise motion-aware text-to-video generation of customized subjects using minimal images, with innovative loss functions and inference techniques for motion control.

Contribution

We introduce MotionBooth, a framework that combines fine-tuning, novel loss functions, and inference methods for motion-aware customized text-to-video synthesis.

Findings

01

Effective subject appearance preservation during motion control

02

Superior motion accuracy compared to baseline methods

03

Robustness across diverse subjects and motions

Abstract

In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance the subject's learning performance, along with a subject token cross-attention loss to integrate the customized subject with motion control signals. Additionally, we propose training-free techniques for managing subject and camera motions during inference. In particular, we utilize cross-attention map manipulation to govern subject motion and introduce a novel latent shift module for camera movement control as well. MotionBooth excels in preserving the appearance of subjects while simultaneously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
jianzongwu/MotionBooth
model

Datasets

jianzongwu/MotionBooth
dataset· 160 dl
160 dl

Videos

MotionBooth: Motion-Aware Customized Text-to-Video Generation· slideslive

Taxonomy

TopicsHuman Motion and Animation · Video Analysis and Summarization · Multimedia Communication and Technology