FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation   Synthesis Using Self-Supervised Speech Representation Learning

Kazi Injamamul Haque; Zerrin Yumak

arXiv:2303.05416·cs.CV·March 10, 2023·1 cites

FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning

Kazi Injamamul Haque, Zerrin Yumak

PDF

Open Access 1 Repo

TL;DR

FaceXHuBERT is a robust, self-supervised speech-driven 3D facial animation method that captures subtle cues, is noise-resistant, and outperforms state-of-the-art in realism and speed without relying on text or complex models.

Contribution

The paper introduces FaceXHuBERT, a novel self-supervised approach that improves 3D facial animation by capturing subtle speech cues and enhancing robustness without large datasets or complex models.

Findings

01

Achieves 78% superiority in realism over state-of-the-art.

02

Four times faster than previous methods.

03

Effectively captures personalized and subtle facial cues.

Abstract

This paper presents FaceXHuBERT, a text-less speech-driven 3D facial animation generation method that allows to capture personalized and subtle cues in speech (e.g. identity, emotion and hesitation). It is also very robust to background noise and can handle audio recorded in a variety of situations (e.g. multiple people speaking). Recent approaches employ end-to-end deep learning taking into account both audio and text as input to generate facial animation for the whole face. However, scarcity of publicly available expressive audio-3D facial animation datasets poses a major bottleneck. The resulting animations still have issues regarding accurate lip-synching, expressivity, person-specific information and generalizability. We effectively employ self-supervised pretrained HuBERT model in the training process that allows us to incorporate both lexical and non-lexical information in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

galib360/facexhubert
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Human Motion and Animation

MethodsSequence to Sequence