MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control

Fatemeh Nazarieh; Zhenhua Feng; Diptesh Kanojia; Muhammad Awais; and Josef Kittler

arXiv:2510.22810·cs.CV·October 28, 2025

MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control

Fatemeh Nazarieh, Zhenhua Feng, Diptesh Kanojia, Muhammad Awais, and Josef Kittler

PDF

TL;DR

MAGIC-Talk is a diffusion-based framework that generates customizable, temporally consistent talking faces from a single image, improving identity preservation, motion coherence, and long video quality.

Contribution

It introduces a novel one-shot diffusion approach with ReferenceNet and AnimateNet, enabling fine-grained editing and stable long-form video generation without multiple references.

Findings

01

Outperforms state-of-the-art in visual quality and synchronization

02

Maintains identity from a single image effectively

03

Reduces flickering and motion inconsistencies in long videos

Abstract

Audio-driven talking face generation has gained significant attention for applications in digital media and virtual avatars. While recent methods improve audio-lip synchronization, they often struggle with temporal consistency, identity preservation, and customization, especially in long video generation. To address these issues, we propose MAGIC-Talk, a one-shot diffusion-based framework for customizable and temporally stable talking face generation. MAGIC-Talk consists of ReferenceNet, which preserves identity and enables fine-grained facial editing via text prompts, and AnimateNet, which enhances motion coherence using structured motion priors. Unlike previous methods requiring multiple reference images or fine-tuning, MAGIC-Talk maintains identity from a single image while ensuring smooth transitions across frames. Additionally, a progressive latent fusion strategy is introduced to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.