SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model

Weipeng Tan; Chuming Lin; Chengming Xu; Xiaozhong Ji; Junwei Zhu,; Chengjie Wang; Yunsheng Wu; Yanwei Fu

arXiv:2409.03270·cs.CV·December 2, 2024

SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model

Weipeng Tan, Chuming Lin, Chengming Xu, Xiaozhong Ji, Junwei Zhu,, Chengjie Wang, Yunsheng Wu, Yanwei Fu

PDF

Open Access

TL;DR

This paper introduces SVP, a diffusion model-based talking head generation framework that incorporates personalized style information, resulting in more diverse, vivid, and high-quality videos with controllable styles.

Contribution

The paper proposes a novel probabilistic style prior learning method and fine-tunes a pretrained Stable Diffusion model to enhance style diversity in talking head videos.

Findings

01

Generated videos are more diverse and vivid.

02

The method outperforms existing state-of-the-art approaches.

03

Flexible control over intrinsic styles is achieved.

Abstract

Talking Head Generation (THG), typically driven by audio, is an important and challenging task with broad application prospects in various fields such as digital humans, film production, and virtual reality. While diffusion model-based THG methods present high quality and stable content generation, they often overlook the intrinsic style which encompasses personalized features such as speaking habits and facial expressions of a video. As consequence, the generated video content lacks diversity and vividness, thus being limited in real life scenarios. To address these issues, we propose a novel framework named Style-Enhanced Vivid Portrait (SVP) which fully leverages style-related information in THG. Specifically, we first introduce the novel probabilistic style prior learning to model the intrinsic style as a Gaussian distribution using facial expressions and audio embedding. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Music Technology and Sound Studies

MethodsDiffusion