Slot-ID: Identity-Preserving Video Generation from Reference Videos via Slot-Based Temporal Identity Encoding
Yixuan Lai, He Wang, Kun Zhou, Tianjia Shao

TL;DR
This paper introduces Slot-ID, a video generation method that uses short reference videos to better preserve identity and natural facial dynamics across poses and expressions, improving over single-image conditioning.
Contribution
It proposes a novel identity-conditioned diffusion-transformer model that encodes dynamic identity features from reference videos, enhancing identity preservation and realism.
Findings
Improves identity retention under pose variations
Maintains high visual realism and prompt faithfulness
Effective across diverse subjects and expressions
Abstract
Producing prompt-faithful videos that preserve a user-specified identity remains challenging: models need to extrapolate facial dynamics from sparse reference while balancing the tension between identity preservation and motion naturalness. Conditioning on a single image completely ignores the temporal signature, which leads to pose-locked motions, unnatural warping, and "average" faces when viewpoints and expressions change. To this end, we introduce an identity-conditioned variant of a diffusion-transformer video generator which uses a short reference video rather than a single portrait. Our key idea is to incorporate the dynamics in the reference. A short clip reveals subject-specific patterns, e.g., how smiles form, across poses and lighting. From this clip, a Sinkhorn-routed encoder learns compact identity tokens that capture characteristic dynamics while remaining pretrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Face Recognition and Perception
