Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
Songtao Luo, Shuang Yang, Shiguang Shan, Xilin Chen

TL;DR
This paper introduces a novel speaker-adaptive lip reading method that leverages the different roles of shallow and deep neural network layers to improve robustness and accuracy in lip reading tasks.
Contribution
It proposes a new approach to learn separable hidden unit contributions for shallow and deep layers, enhancing speaker adaptation in lip reading.
Findings
Outperforms existing methods on LRW-ID and GRID datasets.
Introduces a new dataset CAS-VSR-S68h for extreme speaker variation evaluation.
Demonstrates robustness in diverse and limited speaker scenarios.
Abstract
In this paper, we propose a novel method for speaker adaptation in lip reading, motivated by two observations. Firstly, a speaker's own characteristics can always be portrayed well by his/her few facial images or even a single image with shallow networks, while the fine-grained dynamic features associated with speech content expressed by the talking face always need deep sequential networks to represent accurately. Therefore, we treat the shallow and deep layers differently for speaker adaptive lip reading. Secondly, we observe that a speaker's unique characteristics ( e.g. prominent oral cavity and mandible) have varied effects on lip reading performance for different words and pronunciations, necessitating adaptive enhancement or suppression of the features for robust lip reading. Based on these two observations, we propose to take advantage of the speaker's own characteristics to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis
