CustomTTT: Motion and Appearance Customized Video Generation via   Test-Time Training

Xiuli Bi; Jian Lu; Bo Liu; Xiaodong Cun; Yong Zhang; Weisheng Li; Bin; Xiao

arXiv:2412.15646·cs.CV·December 24, 2024

CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training

Xiuli Bi, Jian Lu, Bo Liu, Xiaodong Cun, Yong Zhang, Weisheng Li, Bin, Xiao

PDF

Open Access 1 Repo 1 Video

TL;DR

CustomTTT introduces a test-time training approach to effectively combine appearance and motion customization in video generation, overcoming artifacts from previous methods and enhancing quality.

Contribution

The paper proposes a novel test-time training technique for combining multiple customized concepts in video diffusion models, improving quality and flexibility.

Findings

01

Outperforms state-of-the-art methods in qualitative evaluations.

02

Effectively combines multiple customized concepts without artifacts.

03

Demonstrates improved video quality through test-time training.

Abstract

Benefiting from large-scale pre-training of text-video pairs, current text-to-video (T2V) diffusion models can generate high-quality videos from the text description. Besides, given some reference images or videos, the parameter-efficient fine-tuning method, i.e. LoRA, can generate high-quality customized concepts, e.g., the specific subject or the motions from a reference video. However, combining the trained multiple concepts from different references into a single network shows obvious artifacts. To this end, we propose CustomTTT, where we can joint custom the appearance and the motion of the given video easily. In detail, we first analyze the prompt influence in the current video diffusion model and find the LoRAs are only needed for the specific layers for appearance and motion customization. Besides, since each LoRA is trained individually, we propose a novel test-time training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rongpiking/customttt
noneOfficial

Videos

CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training· underline

Taxonomy

TopicsHuman Motion and Animation · Face recognition and analysis

MethodsDiffusion