Modality Dropout for Improved Performance-driven Talking Faces
Ahmed Hussen Abdelaziz, Barry-John Theobald, Paul Dixon and, Reinhard Knothe, Nicholas Apostoloff, Sachin Kajareker

TL;DR
This paper presents a novel deep learning method that uses modality dropout during training to improve the realism and accuracy of animated talking faces driven by audiovisual data, suitable for resource-limited devices.
Contribution
The introduction of modality dropout in training enhances audiovisual face animation performance without relying on speech transcription or extensive hardware.
Findings
Audiovisual-driven animation is preferred over video-only in 74% of cases after dropout.
Modality dropout significantly improves viewer preference for audiovisual animations.
The model operates in real-time on resource-limited hardware.
Abstract
We describe our novel deep learning approach for driving animated faces using both acoustic and visual information. In particular, speech-related facial movements are generated using audiovisual information, and non-speech facial movements are generated using only visual information. To ensure that our model exploits both modalities during training, batches are generated that contain audio-only, video-only, and audiovisual input features. The probability of dropping a modality allows control over the degree to which the model exploits audio and visual information during training. Our trained model runs in real-time on resource limited hardware (e.g.\ a smart phone), it is user agnostic, and it is not dependent on a potentially error-prone transcription of the speech. We use subjective testing to demonstrate: 1) the improvement of audiovisual-driven animation over the equivalent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDropout
