DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

Wenchuan Wang; Mengqi Huang; Yijing Tu; Zhendong Mao

arXiv:2505.02192·cs.CV·July 22, 2025

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

Wenchuan Wang, Mengqi Huang, Yijing Tu, Zhendong Mao

PDF

Open Access

TL;DR

DualReal introduces an adaptive joint training framework that effectively fuses identity and motion in text-to-video generation, overcoming conflicts of previous isolated methods and significantly improving consistency and quality.

Contribution

The paper proposes DualReal, a novel framework with adaptive joint training and stage-guided control to enhance identity-motion fusion in video synthesis.

Findings

01

Improves CLIP-I by 21.7% and DINO-I by 31.8% on average.

02

Achieves top performance on nearly all motion metrics.

03

Constructs a comprehensive evaluation benchmark.

Abstract

Customized text-to-video generation with pre-trained large-scale models has recently garnered significant attention by focusing on identity and motion consistency. Existing works typically follow the isolated customized paradigm, where the subject identity or motion dynamics are customized exclusively. However, this paradigm completely ignores the intrinsic mutual constraints and synergistic interdependencies between identity and motion, resulting in identity-motion conflicts throughout the generation process that systematically degrade. To address this, we introduce DualReal, a novel framework that employs adaptive joint training to construct interdependencies between dimensions collaboratively. Specifically, DualReal is composed of two units: (1) Dual-aware Adaptation dynamically switches the training step (i.e., identity or motion), learns the current information guided by the frozen…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Video Analysis and Summarization · Image and Video Quality Assessment

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Adam · Dropout · Diffusion · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding