Speaker-independent machine lip-reading with speaker-dependent viseme   classifiers

Helen L. Bear; Stephen J. Cox; Richard W. Harvey

arXiv:1710.01122·cs.CV·April 26, 2018·1 cites

Speaker-independent machine lip-reading with speaker-dependent viseme classifiers

Helen L. Bear, Stephen J. Cox, Richard W. Harvey

PDF

Open Access

TL;DR

This paper investigates speaker-independent machine lip-reading by creating speaker-dependent viseme classifiers, revealing that while speakers share similar mouth gestures, their usage varies, impacting lip-reading accuracy.

Contribution

The study introduces a phoneme-clustering method to form phoneme-to-viseme maps for individual and multiple speakers, advancing speaker-independent lip-reading techniques.

Findings

01

Speakers share similar mouth gestures but differ in their usage.

02

Speaker-dependent viseme classifiers improve lip-reading accuracy.

03

Visual speech is highly speaker-dependent, affecting model generalization.

Abstract

In machine lip-reading, which is identification of speech from visual-only information, there is evidence to show that visual speech is highly dependent upon the speaker [1]. Here, we use a phoneme-clustering method to form new phoneme-to-viseme maps for both individual and multiple speakers. We use these maps to examine how similarly speakers talk visually. We conclude that broadly speaking, speakers have the same repertoire of mouth gestures, where they differ is in the use of the gestures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Multisensory perception and integration