M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation

Kui Jiang; Shiyu Liu; Junjun Jiang; Hongxun Yao; Xiaopeng Fan

arXiv:2507.08307·cs.CV·August 15, 2025

M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation

Kui Jiang, Shiyu Liu, Junjun Jiang, Hongxun Yao, Xiaopeng Fan

PDF

Open Access

TL;DR

This paper introduces M2DAO-Talker, a novel framework for talking-head video generation that employs multi-granular motion decoupling and alternating optimization to improve realism and reduce artifacts, achieving state-of-the-art results.

Contribution

The paper proposes a unified framework with a new 2D preprocessing pipeline, multi-granular motion decoupling, and an alternating optimization strategy for more accurate and realistic talking-head generation.

Findings

01

Achieves 2.43 dB PSNR improvement over previous methods.

02

Attains 0.64 higher user-rated video realness.

03

Runs at 150 FPS, enabling real-time applications.

Abstract

Audio-driven talking head generation holds significant potential for film production. While existing 3D methods have advanced motion modeling and content synthesis, they often produce rendering artifacts, such as motion blur, temporal jitter, and local penetration, due to limitations in representing stable, fine-grained motion fields. Through systematic analysis, we reformulate talking head generation into a unified framework comprising three steps: video preprocessing, motion representation, and rendering reconstruction. This framework underpins our proposed M2DAO-Talker, which addresses current limitations via multi-granular motion decoupling and alternating optimization. Specifically, we devise a novel 2D portrait preprocessing pipeline to extract frame-wise deformation control conditions (motion region segmentation masks, and camera parameters) to facilitate motion representation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · 3D Shape Modeling and Analysis