Data standardization for robust lip sync

Chun Wang

arXiv:2202.06198·cs.CV·September 10, 2024

Data standardization for robust lip sync

Chun Wang

PDF

Open Access

TL;DR

This paper introduces a data standardization pipeline for lip sync that disentangles and standardizes visual input to improve robustness and data efficiency of lip sync methods, especially in challenging real-world scenarios.

Contribution

It proposes a novel data standardization approach based on 3D face reconstruction to disentangle lip motion from distracting factors, enhancing lip sync robustness.

Findings

01

Improved robustness of lip sync methods in wild conditions.

02

Enhanced data efficiency for existing lip sync models.

03

Achieved competitive performance in active speaker detection.

Abstract

Lip sync is a fundamental audio-visual task. However, existing lip sync methods fall short of being robust in the wild. One important cause could be distracting factors on the visual input side, making extracting lip motion information difficult. To address these issues, this paper proposes a data standardization pipeline to standardize the visual input for lip sync. Based on recent advances in 3D face reconstruction, we first create a model that can consistently disentangle lip motion information from the raw images. Then, standardized images are synthesized with disentangled lip motion information, with all other attributes related to distracting factors set to predefined values independent of the input, to reduce their effects. Using synthesized images, existing lip sync methods improve their data efficiency and robustness, and they achieve competitive performance for the active…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis