Embedded Representation Learning Network for Animating Styled Video   Portrait

Tianyong Wang; Xiangyu Liang; Wangguandong Zheng; Dan Niu; Haifeng Xia; and Siyu Xia

arXiv:2404.19038·cs.CV·May 1, 2024

Embedded Representation Learning Network for Animating Styled Video Portrait

Tianyong Wang, Xiangyu Liang, Wangguandong Zheng, Dan Niu, Haifeng Xia, and Siyu Xia

PDF

Open Access

TL;DR

This paper introduces ERLNet, a novel two-stage generative framework combining audio-driven facial expression synthesis and a dual-branch fusion NeRF to produce realistic, style-controllable talking head videos with reduced artifacts.

Contribution

The paper proposes ERLNet, a new generative paradigm with two stages that improves style control and reduces artifacts in talking head synthesis using NeRF-based methods.

Findings

01

Produces more realistic talking head videos.

02

Effectively controls style and expression.

03

Reduces displacement artifacts around the neck.

Abstract

The talking head generation recently attracted considerable attention due to its widespread application prospects, especially for digital avatars and 3D animation design. Inspired by this practical demand, several works explored Neural Radiance Fields (NeRF) to synthesize the talking heads. However, these methods based on NeRF face two challenges: (1) Difficulty in generating style-controllable talking heads. (2) Displacement artifacts around the neck in rendered images. To overcome these two challenges, we propose a novel generative paradigm \textit{Embedded Representation Learning Network} (ERLNet) with two learning stages. First, the \textit{ audio-driven FLAME} (ADF) module is constructed to produce facial expression and head pose sequences synchronized with content audio and style video. Second, given the sequence deduced by the ADF, one novel \textit{dual-branch fusion NeRF}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Video Analysis and Summarization