# Synthesising 3D Facial Motion from "In-the-Wild" Speech

**Authors:** Panagiotis Tzirakis, Athanasios Papaioannou, Alexander Lattas, Michail, Tarasiou, Bj\"orn Schuller, Stefanos Zafeiriou

arXiv: 1904.07002 · 2019-04-16

## TL;DR

This paper presents a novel deep learning approach for synthesising 3D facial motion from speech recorded in unconstrained, real-world conditions, using a new dataset and a time-warping technique called DCAW.

## Contribution

It introduces the first method for 3D facial motion synthesis from in-the-wild speech, utilizing a new dataset, 3D blendshapes, and a novel end-to-end time-warping technique called DCAW.

## Key findings

- Successfully synthesizes 3D facial motion from in-the-wild speech.
- Handles different speakers and continuous speech signals.
- Demonstrates robustness in uncontrolled recording conditions.

## Abstract

Synthesising 3D facial motion from speech is a crucial problem manifesting in a multitude of applications such as computer games and movies. Recently proposed methods tackle this problem in controlled conditions of speech. In this paper, we introduce the first methodology for 3D facial motion synthesis from speech captured in arbitrary recording conditions ("in-the-wild") and independent of the speaker. For our purposes, we captured 4D sequences of people uttering 500 words, contained in the Lip Reading Words (LRW) a publicly available large-scale in-the-wild dataset, and built a set of 3D blendshapes appropriate for speech. We correlate the 3D shape parameters of the speech blendshapes to the LRW audio samples by means of a novel time-warping technique, named Deep Canonical Attentional Warping (DCAW), that can simultaneously learn hierarchical non-linear representations and a warping path in an end-to-end manner. We thoroughly evaluate our proposed methods, and show the ability of a deep learning model to synthesise 3D facial motion in handling different speakers and continuous speech signals in uncontrolled conditions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.07002/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1904.07002/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1904.07002/full.md

---
Source: https://tomesphere.com/paper/1904.07002