DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework   for Talking Head Video Generation

Hanbo Cheng; Limin Lin; Chenyu Liu; Pengcheng Xia; Pengfei Hu; Jiefeng; Ma; Jun Du; Jia Pan

arXiv:2410.13726·cs.CV·March 27, 2025

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

Hanbo Cheng, Limin Lin, Chenyu Liu, Pengcheng Xia, Pengfei Hu, Jiefeng, Ma, Jun Du, Jia Pan

PDF

Open Access 1 Repo 1 Models

TL;DR

DAWN introduces a non-autoregressive diffusion framework for generating realistic talking head videos from a single portrait and speech audio, enabling fast, high-quality, and long video synthesis with natural facial movements.

Contribution

It is the first to apply non-autoregressive diffusion for all-at-once dynamic talking head video generation, improving speed and long-term consistency.

Findings

01

Produces vivid, authentic talking head videos with precise lip sync.

02

Achieves high generation speed and strong extrapolation for long videos.

03

Demonstrates superior quality over autoregressive methods.

Abstract

Talking head generation intends to produce vivid and realistic talking head videos from a single portrait and speech audio clip. Although significant progress has been made in diffusion-based talking head generation, almost all methods rely on autoregressive strategies, which suffer from limited context utilization beyond the current generation step, error accumulation, and slower generation speed. To address these challenges, we present DAWN (Dynamic frame Avatar With Non-autoregressive diffusion), a framework that enables all-at-once generation of dynamic-length video sequences. Specifically, it consists of two main components: (1) audio-driven holistic facial dynamics generation in the latent motion space, and (2) audio-driven head pose and blink generation. Extensive experiments demonstrate that our method generates authentic and vivid videos with precise lip motions, and natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hanbo-cheng/dawn-pytorch
pytorchOfficial

Models

🤗
Hanbo-Cheng/DAWN
model· ♡ 6
♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis

MethodsDiffusion