SPACE: Speech-driven Portrait Animation with Controllable Expression

Siddharth Gururani; Arun Mallya; Ting-Chun Wang; Rafael Valle; Ming-Yu; Liu

arXiv:2211.09809·cs.CV·December 8, 2022

SPACE: Speech-driven Portrait Animation with Controllable Expression

Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Rafael Valle, Ming-Yu, Liu

PDF

Open Access

TL;DR

SPACE is a novel speech-driven portrait animation method that generates high-quality, expressive videos from a single image with controllable emotions and head poses, outperforming prior approaches.

Contribution

The paper introduces SPACE, a multi-stage framework combining facial landmark control with a pretrained face generator for realistic, controllable portrait animation from speech and a single image.

Findings

01

Outperforms prior methods in image quality metrics

02

Achieves realistic lip sync and facial expressions

03

User studies favor SPACE over existing techniques

Abstract

Animating portraits using speech has received growing attention in recent years, with various creative and practical use cases. An ideal generated video should have good lip sync with the audio, natural facial expressions and head motions, and high frame quality. In this work, we present SPACE, which uses speech and a single image to generate high-resolution, and expressive videos with realistic head pose, without requiring a driving video. It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator. SPACE also allows for the control of emotions and their intensities. Our method outperforms prior methods in objective metrics for image quality and facial motions and is strongly preferred by users in pair-wise comparisons. The project website is available at https://deepimagination.cc/SPACE/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation