Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models
Hadeel Mabrouk, Omar Abugabal, Nourhan Sakr, and Hesham M. Eraqi

TL;DR
This paper introduces a novel approach to lipreading by transferring audio speech recognition knowledge to visual models using cross-modality knowledge distillation, significantly improving lipreading accuracy.
Contribution
It develops a new framework combining sequence and frame-level knowledge distillation, leveraging audio data during training of visual speech recognition models, and proposes an efficient Gaussian averaging technique.
Findings
Achieved 88.64% accuracy on LRW dataset.
Set a new benchmark for lipreading performance.
Demonstrated the effectiveness of cross-modality knowledge transfer.
Abstract
In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training. Impressive progress in the domain of speech recognition has been exhibited by audio and audio-visual systems. Nevertheless, there is still much to be explored with regards to visual speech recognition systems due to the visual ambiguity of some phonemes. To this end, the development of visual speech recognition models is crucial given the instability of audio models. The main contributions of this work are i) building on recent state-of-the-art word-based lipreading models by integrating sequence-level and frame-level Knowledge Distillation (KD) to their systems; ii) leveraging audio data during training visual models, a feat which has not been utilized in prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Advanced Adaptive Filtering Techniques
MethodsKnowledge Distillation
