Comparing heterogeneous visual gestures for measuring the diversity of visual speech signals
Helen L Bear, Richard Harvey

TL;DR
This paper investigates the diversity of visual speech signals by analyzing how different speakers use mouth gestures, using phoneme-clustering to create viseme maps and measure inter-speaker similarities.
Contribution
It introduces a phoneme-clustering method to generate viseme maps for individual and multiple speakers, providing insights into visual speech variability.
Findings
Speakers generally share the same set of mouth gestures.
Differences among speakers lie in how gestures are used.
Visual speech is highly speaker-dependent.
Abstract
Visual lip gestures observed whilst lipreading have a few working definitions, the most common two are; `the visual equivalent of a phoneme' and `phonemes which are indistinguishable on the lips'. To date there is no formal definition, in part because to date we have not established a two-way relationship or mapping between visemes and phonemes. Some evidence suggests that visual speech is highly dependent upon the speaker. So here, we use a phoneme-clustering method to form new phoneme-to-viseme maps for both individual and multiple speakers. We test these phoneme to viseme maps to examine how similarly speakers talk visually and we use signed rank tests to measure the distance between individuals. We conclude that broadly speaking, speakers have the same repertoire of mouth gestures, where they differ is in the use of the gestures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
