DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
Wenchuan Wang, Mengqi Huang, Yijing Tu, Zhendong Mao

TL;DR
DualReal introduces an adaptive joint training framework that effectively fuses identity and motion in text-to-video generation, overcoming conflicts of previous isolated methods and significantly improving consistency and quality.
Contribution
The paper proposes DualReal, a novel framework with adaptive joint training and stage-guided control to enhance identity-motion fusion in video synthesis.
Findings
Improves CLIP-I by 21.7% and DINO-I by 31.8% on average.
Achieves top performance on nearly all motion metrics.
Constructs a comprehensive evaluation benchmark.
Abstract
Customized text-to-video generation with pre-trained large-scale models has recently garnered significant attention by focusing on identity and motion consistency. Existing works typically follow the isolated customized paradigm, where the subject identity or motion dynamics are customized exclusively. However, this paradigm completely ignores the intrinsic mutual constraints and synergistic interdependencies between identity and motion, resulting in identity-motion conflicts throughout the generation process that systematically degrade. To address this, we introduce DualReal, a novel framework that employs adaptive joint training to construct interdependencies between dimensions collaboratively. Specifically, DualReal is composed of two units: (1) Dual-aware Adaptation dynamically switches the training step (i.e., identity or motion), learns the current information guided by the frozen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Video Analysis and Summarization · Image and Video Quality Assessment
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Adam · Dropout · Diffusion · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding
