Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, and Yong Man Ro

TL;DR
This paper introduces a lip reading framework for low-resource languages that leverages general speech knowledge from high-resource languages and language-specific knowledge via a novel memory-augmented decoder, enabling effective lip reading with limited data.
Contribution
It proposes a new method combining general speech knowledge and language-specific memory-augmented decoding for low-resource lip reading.
Findings
Effective lip reading for five languages demonstrated
Significant improvement over baseline models
General and language-specific knowledge integration enhances performance
Abstract
This paper proposes a novel lip reading framework, especially for low-resource languages, which has not been well addressed in the previous literature. Since low-resource languages do not have enough video-text paired data to train the model to have sufficient power to model lip movements and language, it is regarded as challenging to develop lip reading models for low-resource languages. In order to mitigate the challenge, we try to learn general speech knowledge, the ability to model lip movements, from a high-resource language through the prediction of speech units. It is known that different languages partially share common phonemes, thus general speech knowledge learned from one language can be extended to other languages. Then, we try to learn language-specific knowledge, the ability to model language, by proposing Language-specific Memory-augmented Decoder (LMDecoder). LMDecoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Indoor and Outdoor Localization Technologies
