Visual gesture variability between talkers in continuous visual speech
Helen L Bear

TL;DR
This paper investigates how visual gesture variability between talkers affects continuous speech lipreading, revealing that viseme trajectories significantly influence speaker differentiation and system performance.
Contribution
It extends prior work from isolated words to continuous speech, analyzing the impact of viseme trajectories on speaker-dependent lipreading systems.
Findings
Viseme trajectory variability impacts speaker differentiation in continuous speech.
Continuous speech poses greater challenges than isolated words for lipreading systems.
Speaker-dependent viseme mappings are influenced by gesture variability.
Abstract
Recent adoption of deep learning methods to the field of machine lipreading research gives us two options to pursue to improve system performance. Either, we develop end-to-end systems holistically or, we experiment to further our understanding of the visual speech signal. The latter option is more difficult but this knowledge would enable researchers to both improve systems and apply the new knowledge to other domains such as speech therapy. One challenge in lipreading systems is the correct labeling of the classifiers. These labels map an estimated function between visemes on the lips and the phonemes uttered. Here we ask if such maps are speaker-dependent? Prior work investigated isolated word recognition from speaker-dependent (SD) visemes, we extend this to continuous speech. Benchmarked against SD results, and the isolated words performance, we test with RMAV dataset speakers and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Music and Audio Processing
