OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous   Head Motions

Jin Liu; Xi Wang; Xiaomeng Fu; Yesheng Chai; Cai Yu; Jiao Dai; Jizhong; Han

arXiv:2309.16148·cs.CV·September 29, 2023

OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions

Jin Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong, Han

PDF

Open Access

TL;DR

OSM-Net introduces a one-to-many mapping approach for one-shot talking head generation, creating more natural and diverse head motions by constructing a rich motion space and sampling from it, addressing limitations of previous methods.

Contribution

The paper proposes OSM-Net, a novel network that models diverse head motions in one-shot talking head generation by constructing a motion space and enabling sampling for natural motion diversity.

Findings

01

Generates more realistic and diverse head motions.

02

Outperforms existing methods in naturalness and diversity.

03

Effective in modeling one-to-many head motion mappings.

Abstract

One-shot talking head generation has no explicit head movement reference, thus it is difficult to generate talking heads with head motions. Some existing works only edit the mouth area and generate still talking heads, leading to unreal talking head performance. Other works construct one-to-one mapping between audio signal and head motion sequences, introducing ambiguity correspondences into the mapping since people can behave differently in head motions when speaking the same content. This unreasonable mapping form fails to model the diversity and produces either nearly static or even exaggerated head motions, which are unnatural and strange. Therefore, the one-shot talking head generation task is actually a one-to-many ill-posed problem and people present diverse head motions when speaking. Based on the above observation, we propose OSM-Net, a \textit{one-to-many} one-shot talking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation

MethodsContrastive Language-Image Pre-training