M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation
Kui Jiang, Shiyu Liu, Junjun Jiang, Hongxun Yao, Xiaopeng Fan

TL;DR
This paper introduces M2DAO-Talker, a novel framework for talking-head video generation that employs multi-granular motion decoupling and alternating optimization to improve realism and reduce artifacts, achieving state-of-the-art results.
Contribution
The paper proposes a unified framework with a new 2D preprocessing pipeline, multi-granular motion decoupling, and an alternating optimization strategy for more accurate and realistic talking-head generation.
Findings
Achieves 2.43 dB PSNR improvement over previous methods.
Attains 0.64 higher user-rated video realness.
Runs at 150 FPS, enabling real-time applications.
Abstract
Audio-driven talking head generation holds significant potential for film production. While existing 3D methods have advanced motion modeling and content synthesis, they often produce rendering artifacts, such as motion blur, temporal jitter, and local penetration, due to limitations in representing stable, fine-grained motion fields. Through systematic analysis, we reformulate talking head generation into a unified framework comprising three steps: video preprocessing, motion representation, and rendering reconstruction. This framework underpins our proposed M2DAO-Talker, which addresses current limitations via multi-granular motion decoupling and alternating optimization. Specifically, we devise a novel 2D portrait preprocessing pipeline to extract frame-wise deformation control conditions (motion region segmentation masks, and camera parameters) to facilitate motion representation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · 3D Shape Modeling and Analysis
