Understanding the visual speech signal
Helen L Bear

TL;DR
This paper explores the visual speech signal, focusing on visemes, to improve machine lipreading and provide insights beneficial to speech therapy, animation, and psychology, by analyzing speaker variability and viseme utility.
Contribution
It offers a detailed analysis of visemes, their variability across speakers, and demonstrates methods to enhance lipreading accuracy using visemes.
Findings
Visemes vary significantly between speakers.
Using visemes can improve lipreading accuracy.
The study provides insights applicable to multiple fields.
Abstract
For machines to lipread, or understand speech from lip movement, they decode lip-motions (known as visemes) into the spoken sounds. We investigate the visual speech channel to further our understanding of visemes. This has applications beyond machine lipreading; speech therapists, animators, and psychologists can benefit from this work. We explain the influence of speaker individuality, and demonstrate how one can use visemes to boost lipreading.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Music Technology and Sound Studies
