Learning Separable Hidden Unit Contributions for Speaker-Adaptive   Lip-Reading

Songtao Luo; Shuang Yang; Shiguang Shan; Xilin Chen

arXiv:2310.05058·cs.CV·May 1, 2024

Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading

Songtao Luo, Shuang Yang, Shiguang Shan, Xilin Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel speaker-adaptive lip reading method that leverages the different roles of shallow and deep neural network layers to improve robustness and accuracy in lip reading tasks.

Contribution

It proposes a new approach to learn separable hidden unit contributions for shallow and deep layers, enhancing speaker adaptation in lip reading.

Findings

01

Outperforms existing methods on LRW-ID and GRID datasets.

02

Introduces a new dataset CAS-VSR-S68h for extreme speaker variation evaluation.

03

Demonstrates robustness in diverse and limited speaker scenarios.

Abstract

In this paper, we propose a novel method for speaker adaptation in lip reading, motivated by two observations. Firstly, a speaker's own characteristics can always be portrayed well by his/her few facial images or even a single image with shallow networks, while the fine-grained dynamic features associated with speech content expressed by the talking face always need deep sequential networks to represent accurately. Therefore, we treat the shallow and deep layers differently for speaker adaptive lip reading. Secondly, we observe that a speaker's unique characteristics ( e.g. prominent oral cavity and mandible) have varied effects on lip reading performance for different words and pronunciations, necessitating adaptive enhancement or suppression of the features for robust lip reading. Based on these two observations, we propose to take advantage of the speaker's own characteristics to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinchiniao/LSHUC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis