SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model
Weipeng Tan, Chuming Lin, Chengming Xu, Xiaozhong Ji, Junwei Zhu,, Chengjie Wang, Yunsheng Wu, Yanwei Fu

TL;DR
This paper introduces SVP, a diffusion model-based talking head generation framework that incorporates personalized style information, resulting in more diverse, vivid, and high-quality videos with controllable styles.
Contribution
The paper proposes a novel probabilistic style prior learning method and fine-tunes a pretrained Stable Diffusion model to enhance style diversity in talking head videos.
Findings
Generated videos are more diverse and vivid.
The method outperforms existing state-of-the-art approaches.
Flexible control over intrinsic styles is achieved.
Abstract
Talking Head Generation (THG), typically driven by audio, is an important and challenging task with broad application prospects in various fields such as digital humans, film production, and virtual reality. While diffusion model-based THG methods present high quality and stable content generation, they often overlook the intrinsic style which encompasses personalized features such as speaking habits and facial expressions of a video. As consequence, the generated video content lacks diversity and vividness, thus being limited in real life scenarios. To address these issues, we propose a novel framework named Style-Enhanced Vivid Portrait (SVP) which fully leverages style-related information in THG. Specifically, we first introduce the novel probabilistic style prior learning to model the intrinsic style as a Gaussian distribution using facial expressions and audio embedding. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Music Technology and Sound Studies
MethodsDiffusion
