Decoding visemes: improving machine lipreading
Helen L. Bear, Richard Harvey

TL;DR
This paper introduces a novel two-pass training method for phoneme classifiers in machine lip-reading, leveraging viseme training to significantly enhance classification accuracy over previous approaches.
Contribution
It presents a new training algorithm that uses viseme classifiers to improve phoneme classification in lip-reading systems.
Findings
Significant improvement in lip-reading classification performance.
Phoneme classification can outperform viseme classification under certain conditions.
The two-pass training method effectively leverages viseme information for better phoneme recognition.
Abstract
To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work often uses viseme classification supported by language models with varying degrees of success. A few recent works suggest phoneme classification, in the right circumstances, can outperform viseme classification. In this work we present a novel two-pass method of training phoneme classifiers which uses previously trained visemes in the first pass. With our new training algorithm, we show classification performance which significantly improves on previous lip-reading results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Phonetics and Phonology Research · Hearing Impairment and Communication
