Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model

Ke Wang; Junbo Zhang; Yujun Wang; Lei Xie

arXiv:1803.10146·cs.SD·January 1, 2019

Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model

Ke Wang, Junbo Zhang, Yujun Wang, Lei Xie

PDF

1 Repo

TL;DR

This paper empirically compares three speaker adaptation methods (LIN, LHUC, KLD) on a TDNN-LSTM Mandarin speech model, analyzing their effectiveness with varying data sizes and speaker accent degrees.

Contribution

It provides the first comprehensive experimental comparison of multiple DNN-based speaker adaptation methods on Mandarin speech, including accented speakers.

Findings

01

LHUC outperforms LIN and KLD in most scenarios

02

Adaptation effectiveness increases with more data

03

Accent degree impacts adaptation performance

Abstract

Speaker adaptation aims to estimate a speaker specific acoustic model from a speaker independent one to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. A variety of neural network adaptation methods have been proposed since deep learning models have become the main stream. But there still lacks an experimental comparison between different methods, especially when DNN-based acoustic models have been advanced greatly. In this paper, we aim to close this gap by providing an empirical evaluation of three typical speaker adaptation methods: LIN, LHUC and KLD. Adaptation experiments, with different size of adaptation data, are conducted on a strong TDNN-LSTM acoustic model. More challengingly, here, the source and target we are concerned with are standard Mandarin speaker model and accented Mandarin speaker model. We compare the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangkenpu/Adaptation-Interspeech18
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.