Domain Adaptation via Teacher-Student Learning for End-to-End Speech   Recognition

Zhong Meng; Jinyu Li; Yashesh Gaur; Yifan Gong

arXiv:2001.01798·eess.AS·January 8, 2020·1 cites

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition

Zhong Meng, Jinyu Li, Yashesh Gaur, Yifan Gong

PDF

Open Access

TL;DR

This paper extends teacher-student learning to large-scale unsupervised domain adaptation for end-to-end speech recognition, introducing adaptive weighting of teacher and ground-truth knowledge to improve performance.

Contribution

It proposes adaptive teacher-student learning that dynamically combines teacher predictions and ground-truth labels for better domain adaptation in end-to-end speech models.

Findings

01

Achieved 6.3% relative WER reduction with T/S learning.

02

Achieved 10.3% relative WER reduction with adaptive T/S.

03

Validated on 3400 hours of Microsoft Cortana data.

Abstract

Teacher-student (T/S) has shown to be effective for domain adaptation of deep neural network acoustic models in hybrid speech recognition systems. In this work, we extend the T/S learning to large-scale unsupervised domain adaptation of an attention-based end-to-end (E2E) model through two levels of knowledge transfer: teacher's token posteriors as soft labels and one-best predictions as decoder guidance. To further improve T/S learning with the help of ground-truth labels, we propose adaptive T/S (AT/S) learning. Instead of conditionally choosing from either the teacher's soft token posteriors or the one-hot ground-truth label, in AT/S, the student always learns from both the teacher and the ground truth with a pair of adaptive weights assigned to the soft and one-hot labels quantifying the confidence on each of the knowledge sources. The confidence scores are dynamically estimated at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing