VQTalker: Towards Multilingual Talking Avatars through Facial Motion   Tokenization

Tao Liu; Ziyang Ma; Qi Chen; Feilong Chen; Shuai Fan; Xie Chen; Kai Yu

arXiv:2412.09892·cs.CV·December 19, 2024

VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization

Tao Liu, Ziyang Ma, Qi Chen, Feilong Chen, Shuai Fan, Xie Chen, Kai Yu

PDF

Open Access 1 Video

TL;DR

VQTalker introduces a novel vector quantization framework for multilingual talking head generation, enabling realistic, synchronized facial animations across languages with limited data, by discretizing facial motions into shared sound units.

Contribution

It proposes a facial motion tokenizer based on GRFSQ for capturing and generalizing facial movements across languages, advancing multilingual talking face synthesis.

Findings

01

Achieves state-of-the-art results in multilingual scenarios

02

Generates high-quality videos at 512x512 resolution with low bitrate

03

Demonstrates effective cross-lingual facial motion transfer

Abstract

We present VQTalker, a Vector Quantization-based framework for multilingual talking head generation that addresses the challenges of lip synchronization and natural motion across diverse languages. Our approach is grounded in the phonetic principle that human speech comprises a finite set of distinct sound units (phonemes) and corresponding visual articulations (visemes), which often share commonalities across languages. We introduce a facial motion tokenizer based on Group Residual Finite Scalar Quantization (GRFSQ), which creates a discretized representation of facial features. This method enables comprehensive capture of facial movements while improving generalization to multiple languages, even with limited training data. Building on this quantized representation, we implement a coarse-to-fine motion generation process that progressively refines facial animations. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization· underline

Taxonomy

TopicsFace recognition and analysis · Human Motion and Animation · Human Pose and Action Recognition

MethodsSparse Evolutionary Training