StyleTalk: One-shot Talking Head Generation with Controllable Speaking   Styles

Yifeng Ma; Suzhen Wang; Zhipeng Hu; Changjie Fan; Tangjie Lv; Yu Ding,; Zhidong Deng; Xin Yu

arXiv:2301.01081·cs.CV·June 13, 2023

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma, Suzhen Wang, Zhipeng Hu, Changjie Fan, Tangjie Lv, Yu Ding,, Zhidong Deng, Xin Yu

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

StyleTalk introduces a novel framework for one-shot talking head generation that allows controllable speaking styles by extracting style from reference videos and integrating it into synthesized videos using a style-aware transformer.

Contribution

The paper proposes a style encoder, style-controllable decoder, and style-aware transformer to enable diverse, style-controllable talking head synthesis from a single image and audio.

Findings

01

Capable of generating diverse speaking styles from one portrait and audio.

02

Achieves realistic and authentic visual effects in generated videos.

03

Outperforms existing methods in style controllability and visual quality.

Abstract

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fuxivirtualhuman/styletalk
pytorchOfficial

Models

🤗
ameerazam08/styletalk
model· ♡ 1
♡ 1

Videos

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles· underline

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Human Motion and Animation

MethodsContrastive Language-Image Pre-training