Phoneme-to-viseme mappings: the good, the bad, and the ugly

Helen L Bear; Richard Harvey

arXiv:1805.02934·cs.CV·May 9, 2018

Phoneme-to-viseme mappings: the good, the bad, and the ugly

Helen L Bear, Richard Harvey

PDF

TL;DR

This paper investigates the impact of different phoneme-to-viseme mappings on lip-reading performance, introduces a new data-driven algorithm for creating improved viseme sets, and demonstrates that these new visemes outperform existing ones.

Contribution

It provides a comparative analysis of viseme-to-phoneme mappings and proposes a novel algorithm for constructing more effective viseme units from speech data.

Findings

01

Different viseme-to-phoneme maps significantly affect recognition performance.

02

New 'Bear' visemes outperform existing viseme sets.

03

Data-driven viseme construction improves lip-reading accuracy.

Abstract

Visemes are the visual equivalent of phonemes. Although not precisely defined, a working definition of a viseme is "a set of phonemes which have identical appearance on the lips". Therefore a phoneme falls into one viseme class but a viseme may represent many phonemes: a many to one mapping. This mapping introduces ambiguity between phonemes when using viseme classifiers. Not only is this ambiguity damaging to the performance of audio-visual classifiers operating on real expressive speech, there is also considerable choice between possible mappings. In this paper we explore the issue of this choice of viseme-to-phoneme map. We show that there is definite difference in performance between viseme-to-phoneme mappings and explore why some maps appear to work better than others. We also devise a new algorithm for constructing phoneme-to-viseme mappings from labeled speech data. These new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.