TL;DR
This paper introduces a meta-learning approach to adapt all weights of acoustic models in speech recognition, improving speaker adaptation performance over traditional methods and demonstrating effectiveness on DNN and TDNN models.
Contribution
It presents a novel meta-learning framework for comprehensive speaker adaptation, outperforming existing LHUC-based methods on DNN acoustic models.
Findings
Meta-learner improves speaker adaptation performance.
Outperforms LHUC adaptation on DNN acoustic models.
Achieves comparable results with LHUC on TDNN models.
Abstract
The performance of automatic speech recognition systems can be improved by adapting an acoustic model to compensate for the mismatch between training and testing conditions, for example by adapting to unseen speakers. The success of speaker adaptation methods relies on selecting weights that are suitable for adaptation and using good adaptation schedules to update these weights in order not to overfit to the adaptation data. In this paper we investigate a principled way of adapting all the weights of the acoustic model using a meta-learning. We show that the meta-learner can learn to perform supervised and unsupervised speaker adaptation and that it outperforms a strong baseline adapting LHUC parameters when adapting a DNN AM with 1.5M parameters. We also report initial experiments on adapting TDNN AMs, where the meta-learner achieves comparable performance with LHUC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Model
