TL;DR
This paper empirically compares three speaker adaptation methods (LIN, LHUC, KLD) on a TDNN-LSTM Mandarin speech model, analyzing their effectiveness with varying data sizes and speaker accent degrees.
Contribution
It provides the first comprehensive experimental comparison of multiple DNN-based speaker adaptation methods on Mandarin speech, including accented speakers.
Findings
LHUC outperforms LIN and KLD in most scenarios
Adaptation effectiveness increases with more data
Accent degree impacts adaptation performance
Abstract
Speaker adaptation aims to estimate a speaker specific acoustic model from a speaker independent one to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. A variety of neural network adaptation methods have been proposed since deep learning models have become the main stream. But there still lacks an experimental comparison between different methods, especially when DNN-based acoustic models have been advanced greatly. In this paper, we aim to close this gap by providing an empirical evaluation of three typical speaker adaptation methods: LIN, LHUC and KLD. Adaptation experiments, with different size of adaptation data, are conducted on a strong TDNN-LSTM acoustic model. More challengingly, here, the source and target we are concerned with are standard Mandarin speaker model and accented Mandarin speaker model. We compare the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
