Proteus-ID: ID-Consistent and Motion-Coherent Video Customization

Guiyu Zhang; Chen Shi; Zijian Jiang; Xunzhi Xiang; Jingjing Qian; Shaoshuai Shi; Li Jiang

arXiv:2506.23729·cs.CV·February 4, 2026

Proteus-ID: ID-Consistent and Motion-Coherent Video Customization

Guiyu Zhang, Chen Shi, Zijian Jiang, Xunzhi Xiang, Jingjing Qian, Shaoshuai Shi, Li Jiang

PDF

Open Access 1 Models

TL;DR

Proteus-ID introduces a diffusion-based framework for creating realistic, identity-preserving, and motion-coherent customized videos from a single image and text prompt, advancing the state of video synthesis.

Contribution

It proposes novel modules for multimodal identity fusion, dynamic identity conditioning, and motion learning, significantly improving video customization quality.

Findings

01

Outperforms prior methods in identity preservation and motion realism

02

Achieves superior text alignment in generated videos

03

Establishes a new benchmark with the Proteus-Bench dataset

Abstract

Video identity customization seeks to synthesize realistic, temporally coherent videos of a specific subject, given a single reference image and a text prompt. This task presents two core challenges: (1) maintaining identity consistency while aligning with the described appearance and actions, and (2) generating natural, fluid motion without unrealistic stiffness. To address these challenges, we introduce Proteus-ID, a novel diffusion-based framework for identity-consistent and motion-coherent video customization. First, we propose a Multimodal Identity Fusion (MIF) module that unifies visual and textual cues into a joint identity representation using a Q-Former, providing coherent guidance to the diffusion model and eliminating modality imbalance. Second, we present a Time-Aware Identity Injection (TAII) mechanism that dynamically modulates identity conditioning across denoising steps,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
fateforward/Proteus-ID
model· 1 dl· ♡ 1
1 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Multimodal Machine Learning Applications