Mutual Information Maximization for Effective Lip Reading
Xing Zhao, Shuang Yang, Shiguang Shan, Xilin Chen

TL;DR
This paper introduces mutual information constraints at local and global levels to improve lip reading accuracy by enhancing feature relevance and robustness against noise, achieving state-of-the-art results.
Contribution
It proposes a novel mutual information maximization approach at local and global levels to enhance lip reading models' discriminative power and noise resistance.
Findings
Achieved new state-of-the-art performance on two large-scale benchmarks.
Demonstrated improved discrimination of similar words like 'spend' and 'spending'.
Showed enhanced robustness to pose, lighting, and appearance variations.
Abstract
Lip reading has received an increasing research interest in recent years due to the rapid development of deep learning and its widespread potential applications. One key point to obtain good performance for the lip reading task depends heavily on how effective the representation can be to capture the lip movement information and meanwhile to resist the noises resulted from the change of pose, lighting conditions, speaker's appearance and so on. Towards this target, we propose to introduce the mutual information constraints on both the local feature's level and the global sequence's level to enhance the relations of the features with the speech content. On the one hand, we constraint the features generated at each time step to enable them carry a strong relation with the speech content by imposing the local mutual information maximization constraint (LMIM), leading to improvements over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Indoor and Outdoor Localization Technologies
