Customize-A-Video: One-Shot Motion Customization of Text-to-Video   Diffusion Models

Yixuan Ren; Yang Zhou; Jimei Yang; Jing Shi; Difan Liu; Feng Liu,; Mingi Kwon; Abhinav Shrivastava

arXiv:2402.14780·cs.CV·August 29, 2024·1 cites

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

Yixuan Ren, Yang Zhou, Jimei Yang, Jing Shi, Difan Liu, Feng Liu,, Mingi Kwon, Abhinav Shrivastava

PDF

Open Access

TL;DR

This paper introduces a method for one-shot motion customization in text-to-video diffusion models, enabling the transfer of motion from a single reference video to new subjects and scenes with spatial and temporal variations.

Contribution

It proposes a novel approach using LoRA on temporal attention layers and appearance absorbers to disentangle appearance and motion, facilitating flexible video customization.

Findings

01

Effective motion transfer from a single reference video

02

Enables various downstream video editing tasks

03

Plug-and-play inference for easy extension

Abstract

Image customization has been extensively studied in text-to-image (T2I) diffusion models, leading to impressive outcomes and applications. With the emergence of text-to-video (T2V) diffusion models, its temporal counterpart, motion customization, has not yet been well investigated. To address the challenge of one-shot video motion customization, we propose Customize-A-Video that models the motion from a single reference video and adapts it to new subjects and scenes with both spatial and temporal varieties. It leverages low-rank adaptation (LoRA) on temporal attention layers to tailor the pre-trained T2V diffusion model for specific motion modeling. To disentangle the spatial and temporal information during training, we introduce a novel concept of appearance absorbers that detach the original appearance from the reference video prior to motion learning. The proposed modules are trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology

MethodsDiffusion