Modality Dropout for Improved Performance-driven Talking Faces

Ahmed Hussen Abdelaziz; Barry-John Theobald; Paul Dixon and; Reinhard Knothe; Nicholas Apostoloff; Sachin Kajareker

arXiv:2005.13616·eess.AS·May 29, 2020

Modality Dropout for Improved Performance-driven Talking Faces

Ahmed Hussen Abdelaziz, Barry-John Theobald, Paul Dixon and, Reinhard Knothe, Nicholas Apostoloff, Sachin Kajareker

PDF

TL;DR

This paper presents a novel deep learning method that uses modality dropout during training to improve the realism and accuracy of animated talking faces driven by audiovisual data, suitable for resource-limited devices.

Contribution

The introduction of modality dropout in training enhances audiovisual face animation performance without relying on speech transcription or extensive hardware.

Findings

01

Audiovisual-driven animation is preferred over video-only in 74% of cases after dropout.

02

Modality dropout significantly improves viewer preference for audiovisual animations.

03

The model operates in real-time on resource-limited hardware.

Abstract

We describe our novel deep learning approach for driving animated faces using both acoustic and visual information. In particular, speech-related facial movements are generated using audiovisual information, and non-speech facial movements are generated using only visual information. To ensure that our model exploits both modalities during training, batches are generated that contain audio-only, video-only, and audiovisual input features. The probability of dropping a modality allows control over the degree to which the model exploits audio and visual information during training. Our trained model runs in real-time on resource limited hardware (e.g.\ a smart phone), it is user agnostic, and it is not dependent on a potentially error-prone transcription of the speech. We use subjective testing to demonstrate: 1) the improvement of audiovisual-driven animation over the equivalent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDropout