Understanding the visual speech signal

Helen L Bear

arXiv:1710.01351·cs.CV·April 26, 2018

Understanding the visual speech signal

Helen L Bear

PDF

Open Access

TL;DR

This paper explores the visual speech signal, focusing on visemes, to improve machine lipreading and provide insights beneficial to speech therapy, animation, and psychology, by analyzing speaker variability and viseme utility.

Contribution

It offers a detailed analysis of visemes, their variability across speakers, and demonstrates methods to enhance lipreading accuracy using visemes.

Findings

01

Visemes vary significantly between speakers.

02

Using visemes can improve lipreading accuracy.

03

The study provides insights applicable to multiple fields.

Abstract

For machines to lipread, or understand speech from lip movement, they decode lip-motions (known as visemes) into the spoken sounds. We investigate the visual speech channel to further our understanding of visemes. This has applications beyond machine lipreading; speech therapists, animators, and psychologists can benefit from this work. We explain the influence of speaker individuality, and demonstrate how one can use visemes to boost lipreading.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Music Technology and Sound Studies