Improving Speaker-Independent Lipreading with Domain-Adversarial   Training

Michael Wand; Juergen Schmidhuber

arXiv:1708.01565·cs.CV·August 7, 2017

Improving Speaker-Independent Lipreading with Domain-Adversarial Training

Michael Wand, Juergen Schmidhuber

PDF

TL;DR

This paper introduces a lipreading system that employs domain-adversarial training to achieve speaker-independent recognition, significantly improving accuracy with minimal untranscribed target data.

Contribution

It integrates domain-adversarial training into a lipreading neural network to enhance speaker independence with limited target speaker data.

Findings

01

Achieves around 40% relative accuracy improvement with 15-20 seconds of target data.

02

Effective in speaker adaptation with minimal untranscribed data.

03

Substantial accuracy gains in multi-speaker setups.

Abstract

We present a Lipreading system, i.e. a speech recognition system using only visual features, which uses domain-adversarial training for speaker independence. Domain-adversarial training is integrated into the optimization of a lipreader based on a stack of feedforward and LSTM (Long Short-Term Memory) recurrent neural networks, yielding an end-to-end trainable system which only requires a very small number of frames of untranscribed target data to substantially improve the recognition accuracy on the target speaker. On pairs of different source and target speakers, we achieve a relative accuracy improvement of around 40% with only 15 to 20 seconds of untranscribed target speech data. On multi-speaker training setups, the accuracy improvements are smaller but still substantial.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory