The speaker-independent lipreading play-off; a survey of lipreading   machines

Jake Burton; David Frank; Madhi Saleh; Nassir Navab; Helen L. Bear

arXiv:1810.10597·cs.CV·October 26, 2018

The speaker-independent lipreading play-off; a survey of lipreading machines

Jake Burton, David Frank, Madhi Saleh, Nassir Navab, Helen L. Bear

PDF

Open Access

TL;DR

This survey reviews speaker-independent lipreading methods, providing benchmarks and showing that current best models achieve around 70% accuracy, highlighting the gap with speaker-dependent systems.

Contribution

It offers the first systematic benchmark of speaker-independent lipreading on TCD-TIMIT, comparing conventional and deep learning approaches.

Findings

01

Best speaker-independent accuracy is 69.58% with CNN and SVM.

02

Speaker-independent performance is lower than speaker-dependent but surpasses previous reports.

03

Provides a comprehensive benchmark for future research in speaker-independent lipreading.

Abstract

Lipreading is a difficult gesture classification task. One problem in computer lipreading is speaker-independence. Speaker-independence means to achieve the same accuracy on test speakers not included in the training set as speakers within the training set. Current literature is limited on speaker-independent lipreading, the few independent test speaker accuracy scores are usually aggregated within dependent test speaker accuracies for an averaged performance. This leads to unclear independent results. Here we undertake a systematic survey of experiments with the TCD-TIMIT dataset using both conventional approaches and deep learning methods to provide a series of wholly speaker-independent benchmarks and show that the best speaker-independent machine scores 69.58% accuracy with CNN features and an SVM classifier. This is less than state of the art speaker-dependent lipreading machines,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHearing Impairment and Communication · Hand Gesture Recognition Systems · Speech and Audio Processing