Speaker Adaptation for Attention-Based End-to-End Speech Recognition

Zhong Meng; Yashesh Gaur; Jinyu Li; Yifan Gong

arXiv:1911.03762·cs.CL·November 12, 2019

Speaker Adaptation for Attention-Based End-to-End Speech Recognition

Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong

PDF

TL;DR

This paper introduces three regularization-based speaker adaptation methods for attention-based end-to-end speech recognition, significantly improving word error rates with limited adaptation data.

Contribution

It proposes novel regularization techniques—KLD, adversarial learning, and multi-task training—for speaker adaptation in AED models, enhancing performance with minimal data.

Findings

01

Achieved up to 12.2% WER reduction on Microsoft dictation task.

02

Effective adaptation with limited data, both supervised and unsupervised.

03

All three methods outperform baseline speaker-independent models.

Abstract

We propose three regularization-based speaker adaptation approaches to adapt the attention-based encoder-decoder (AED) model with very limited adaptation data from target speakers for end-to-end automatic speech recognition. The first method is Kullback-Leibler divergence (KLD) regularization, in which the output distribution of a speaker-dependent (SD) AED is forced to be close to that of the speaker-independent (SI) model by adding a KLD regularization to the adaptation criterion. To compensate for the asymmetric deficiency in KLD regularization, an adversarial speaker adaptation (ASA) method is proposed to regularize the deep-feature distribution of the SD AED through the adversarial learning of an auxiliary discriminator and the SD AED. The third approach is the multi-task learning, in which an SD AED is trained to jointly perform the primary task of predicting a large number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.