Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization
Linzhi Wu, Xingyu Zhang, Yakun Zhang, Changyan Zheng, Tiejun Liu,, Liang Xie, Ye Yan, Erwei Yin

TL;DR
This paper introduces a landmark-guided lip reading method with mutual information regularization to improve cross-speaker robustness by reducing speaker-specific visual variations and capturing speaker-invariant features.
Contribution
It proposes using lip landmarks as input features and a mutual information regularization to enhance speaker-robust lip reading models, addressing inter-speaker variability.
Findings
Improved accuracy in cross-speaker lip reading tasks.
Effective reduction of speaker-specific appearance influence.
Enhanced model generalization across different speakers.
Abstract
Lip reading, the process of interpreting silent speech from visual lip movements, has gained rising attention for its wide range of realistic applications. Deep learning approaches greatly improve current lip reading systems. However, lip reading in cross-speaker scenarios where the speaker identity changes, poses a challenging problem due to inter-speaker variability. A well-trained lip reading system may perform poorly when handling a brand new speaker. To learn a speaker-robust lip reading model, a key insight is to reduce visual variations across speakers, avoiding the model overfitting to specific speakers. In this work, in view of both input visual clues and latent representations based on a hybrid CTC/attention architecture, we propose to exploit the lip landmark-guided fine-grained visual clues instead of frequently-used mouth-cropped images as input features, diminishing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis
