Identity-Preserving Video Dubbing Using Motion Warping

Runzhen Liu; Qinjie Lin; Yunfei Liu; Lijian Lin; Ye Zhu; Yu Li; Chuhua; Xian; Fa-Ting Hong

arXiv:2501.04586·cs.CV·January 10, 2025

Identity-Preserving Video Dubbing Using Motion Warping

Runzhen Liu, Qinjie Lin, Yunfei Liu, Lijian Lin, Ye Zhu, Yu Li, Chuhua, Xian, Fa-Ting Hong

PDF

Open Access

TL;DR

This paper introduces IPTalker, a transformer-based framework for video dubbing that achieves high-fidelity identity preservation and lip-sync accuracy by dynamically aligning audio cues with reference visuals and refining the generated videos.

Contribution

The paper presents a novel transformer-based alignment mechanism combined with motion warping and refinement strategies to improve identity preservation in video dubbing.

Findings

01

Outperforms existing methods in realism and lip-sync accuracy

02

Achieves superior identity retention in generated videos

03

Establishes new state-of-the-art in identity-consistent video dubbing

Abstract

Video dubbing aims to synthesize realistic, lip-synced videos from a reference video and a driving audio signal. Although existing methods can accurately generate mouth shapes driven by audio, they often fail to preserve identity-specific features, largely because they do not effectively capture the nuanced interplay between audio cues and the visual attributes of reference identity . As a result, the generated outputs frequently lack fidelity in reproducing the unique textural and structural details of the reference identity. To address these limitations, we propose IPTalker, a novel and robust framework for video dubbing that achieves seamless alignment between driving audio and reference identity while ensuring both lip-sync accuracy and high-fidelity identity preservation. At the core of IPTalker is a transformer-based alignment mechanism designed to dynamically capture and model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Human Motion and Animation