Speaker-Invariant Training via Adversarial Learning

Zhong Meng; Jinyu Li; Zhuo Chen; Yong Zhao; Vadim Mazalov; Yifan Gong,; Biing-Hwang (Fred) Juang

arXiv:1804.00732·eess.AS·May 1, 2019

Speaker-Invariant Training via Adversarial Learning

Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong,, Biing-Hwang (Fred) Juang

PDF

TL;DR

This paper introduces a speaker-invariant training method using adversarial multi-task learning to improve speech recognition accuracy by reducing speaker variability without explicit speaker normalization.

Contribution

The novel adversarial training scheme (SIT) learns speaker-invariant features for DNN acoustic models, enhancing ASR performance without relying on speaker-specific transformations.

Findings

01

Achieved 4.99% relative WER reduction on CHiME-3 dataset.

02

Further improved WER by 4.86% with unsupervised speaker adaptation.

03

Demonstrated effectiveness of adversarial multi-task learning in speaker invariance.

Abstract

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system. We call the scheme speaker-invariant training (SIT). In SIT, a DNN acoustic model and a speaker classifier network are jointly optimized to minimize the senone (tied triphone state) classification loss, and simultaneously mini-maximize the speaker classification loss. A speaker-invariant and senone-discriminative deep feature is learned through this adversarial multi-task learning. With SIT, a canonical DNN acoustic model with significantly reduced variance in its output probabilities is learned with no explicit speaker-independent (SI) transformations or speaker-specific representations used in training or testing. Evaluated on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.