Finding phonemes: improving machine lip-reading

Helen L. Bear; Richard W. Harvey; Yuxuan Lan

arXiv:1710.01142·cs.CV·April 26, 2018·5 cites

Finding phonemes: improving machine lip-reading

Helen L. Bear, Richard W. Harvey, Yuxuan Lan

PDF

Open Access

TL;DR

This paper explores how different phoneme-to-viseme mappings affect speaker-dependent machine lip-reading, demonstrating that phoneme classifiers can outperform viseme classifiers and that intermediate units may offer further improvements.

Contribution

It introduces a structured method for creating speaker-dependent phoneme-to-viseme maps with varying viseme counts, showing their impact on lip-reading accuracy.

Findings

01

Word recognition with phoneme classifiers often surpasses viseme classifiers.

02

Intermediate units between visemes and phonemes can improve recognition accuracy.

03

Varying viseme set sizes influences lip-reading performance.

Abstract

In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated phonemes, which have been confused during phoneme recognition, into viseme groups. Using these maps, with the LiLIR dataset, we show the effect of changing the viseme map size in speaker-dependent machine lip-reading, measured by word recognition correctness and so demonstrate that word recognition with phoneme classifiers is not just possible, but often better than word recognition with viseme classifiers. Furthermore, there are intermediate units between visemes and phonemes which are better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Speech Recognition and Synthesis