DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Xu Guo; Fulong Ye; Xinghui Li; Pengqi Tu; Pengze Zhang; Qichao Sun; Songtao Zhao; Xiangwang Hou; Qian He

arXiv:2601.01425·cs.CV·January 6, 2026

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Xu Guo, Fulong Ye, Xinghui Li, Pengqi Tu, Pengze Zhang, Qichao Sun, Songtao Zhao, Xiangwang Hou, Qian He

PDF

Open Access 1 Models

TL;DR

DreamID-V introduces a diffusion transformer framework for high-fidelity face swapping in videos, effectively maintaining identity, attributes, and temporal consistency, and is supported by a new benchmark dataset.

Contribution

It presents a novel diffusion transformer-based approach with a unique data pipeline and training strategies for improved video face swapping quality.

Findings

01

Outperforms existing state-of-the-art methods

02

Achieves high visual realism and identity preservation

03

Demonstrates versatility across various swap tasks

Abstract

Video Face Swapping (VFS) requires seamlessly injecting a source identity into a target video while meticulously preserving the original pose, expression, lighting, background, and dynamic information. Existing methods struggle to maintain identity similarity and attribute preservation while preserving temporal consistency. To address the challenge, we propose a comprehensive framework to seamlessly transfer the superiority of Image Face Swapping (IFS) to the video domain. We first introduce a novel data pipeline SyncID-Pipe that pre-trains an Identity-Anchored Video Synthesizer and combines it with IFS models to construct bidirectional ID quadruplets for explicit supervision. Building upon paired data, we propose the first Diffusion Transformer-based framework DreamID-V, employing a core Modality-Aware Conditioning module to discriminatively inject multi-model conditions. Meanwhile, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
XuGuo699/DreamID-V
model· ♡ 61
♡ 61

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Speech and Audio Processing