DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Hanbo Cheng, Limin Lin, Chenyu Liu, Pengcheng Xia, Pengfei Hu, Jiefeng, Ma, Jun Du, Jia Pan

TL;DR
DAWN introduces a non-autoregressive diffusion framework for generating realistic talking head videos from a single portrait and speech audio, enabling fast, high-quality, and long video synthesis with natural facial movements.
Contribution
It is the first to apply non-autoregressive diffusion for all-at-once dynamic talking head video generation, improving speed and long-term consistency.
Findings
Produces vivid, authentic talking head videos with precise lip sync.
Achieves high generation speed and strong extrapolation for long videos.
Demonstrates superior quality over autoregressive methods.
Abstract
Talking head generation intends to produce vivid and realistic talking head videos from a single portrait and speech audio clip. Although significant progress has been made in diffusion-based talking head generation, almost all methods rely on autoregressive strategies, which suffer from limited context utilization beyond the current generation step, error accumulation, and slower generation speed. To address these challenges, we present DAWN (Dynamic frame Avatar With Non-autoregressive diffusion), a framework that enables all-at-once generation of dynamic-length video sequences. Specifically, it consists of two main components: (1) audio-driven holistic facial dynamics generation in the latent motion space, and (2) audio-driven head pose and blink generation. Extensive experiments demonstrate that our method generates authentic and vivid videos with precise lip motions, and natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis
MethodsDiffusion
