IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer

Bo Chen; Tao Liu; Qi Chen; Xie Chen; Zilong Zheng

arXiv:2511.22167·cs.CV·December 1, 2025

IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer

Bo Chen, Tao Liu, Qi Chen, Xie Chen, Zilong Zheng

PDF

Open Access

TL;DR

IMTalker introduces an efficient, high-fidelity talking face generation framework that uses implicit motion transfer via cross-attention, improving motion accuracy, identity preservation, and synchronization over prior explicit flow-based methods.

Contribution

The paper proposes a novel implicit motion transfer approach with a cross-attention mechanism and identity-adaptive module, enhancing global motion modeling and identity disentanglement in talking face synthesis.

Findings

01

Outperforms prior methods in motion accuracy and identity preservation.

02

Achieves real-time generation at 40-42 FPS on high-end GPU.

03

Demonstrates superior audio-lip synchronization quality.

Abstract

Talking face generation aims to synthesize realistic speaking portraits from a single image, yet existing methods often rely on explicit optical flow and local warping, which fail to model complex global motions and cause identity drift. We present IMTalker, a novel framework that achieves efficient and high-fidelity talking face generation through implicit motion transfer. The core idea is to replace traditional flow-based warping with a cross-attention mechanism that implicitly models motion discrepancy and identity alignment within a unified latent space, enabling robust global motion rendering. To further preserve speaker identity during cross-identity reenactment, we introduce an identity-adaptive module that projects motion latents into personalized spaces, ensuring clear disentanglement between motion and identity. In addition, a lightweight flow-matching motion generator…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing