Speech-driven facial animation using polynomial fusion of features

Triantafyllos Kefalas; Konstantinos Vougioukas; Yannis Panagakis,; Stavros Petridis; Jean Kossaifi; Maja Pantic

arXiv:1912.05833·cs.LG·February 20, 2020

Speech-driven facial animation using polynomial fusion of features

Triantafyllos Kefalas, Konstantinos Vougioukas, Yannis Panagakis,, Stavros Petridis, Jean Kossaifi, Maja Pantic

PDF

TL;DR

This paper introduces a polynomial fusion layer for speech-driven facial animation, capturing higher-order feature interactions to improve video realism, synchronization, and natural blinking in generated talking face videos.

Contribution

It proposes a novel polynomial fusion layer with tensor decomposition to model complex feature interactions in facial animation from speech signals.

Findings

01

Improved video quality metrics

02

Enhanced audiovisual synchronization

03

More natural blinking in generated videos

Abstract

Speech-driven facial animation involves using a speech signal to generate realistic videos of talking faces. Recent deep learning approaches to facial synthesis rely on extracting low-dimensional representations and concatenating them, followed by a decoding step of the concatenated vector. This accounts for only first-order interactions of the features and ignores higher-order interactions. In this paper we propose a polynomial fusion layer that models the joint representation of the encodings by a higher-order polynomial, with the parameters modelled by a tensor decomposition. We demonstrate the suitability of this approach through experiments on generated videos evaluated on a range of metrics on video quality, audiovisual synchronisation and generation of blinks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.