AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Huawei Wei, Zejun Yang, Zhisheng Wang

TL;DR
AniPortrait is a new framework that synthesizes photorealistic portrait animations driven by audio and reference images, using a two-stage process involving 3D landmark extraction and a diffusion model for realistic rendering.
Contribution
It introduces a novel two-stage approach combining 3D landmark extraction and diffusion models for high-quality, controllable portrait animation driven by audio.
Findings
Outperforms existing methods in facial naturalness and visual quality.
Demonstrates high pose diversity and temporal consistency.
Shows potential for facial motion editing and face reenactment.
Abstract
In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image. Our methodology is divided into two stages. Initially, we extract 3D intermediate representations from audio and project them into a sequence of 2D facial landmarks. Subsequently, we employ a robust diffusion model, coupled with a motion module, to convert the landmark sequence into photorealistic and temporally consistent portrait animation. Experimental results demonstrate the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality, thereby offering an enhanced perceptual experience. Moreover, our methodology exhibits considerable potential in terms of flexibility and controllability, which can be effectively applied in areas such as facial motion editing or face reenactment. We release code and model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage
MethodsDiffusion
