A Unified Speaker Adaptation Approach for ASR
Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq Joty, Eng Siong, Chng, Bin Ma

TL;DR
This paper introduces a unified speaker adaptation method for ASR that combines feature and model adaptation, using speaker-aware memory and gradual pruning, achieving significant WER improvements on Librispeech.
Contribution
It proposes a novel combination of feature adaptation with speaker-aware memory and a gradual pruning-based model adaptation, avoiding architecture changes and reducing computational costs.
Findings
Achieves 2.74-6.52% relative WER reduction in general speaker adaptation.
Outperforms baseline and finetuning methods in target speaker adaptation.
Effective even with extremely low-resource data, improving WER with minimal training.
Abstract
Transformer models have been used in automatic speech recognition (ASR) successfully and yields state-of-the-art results. However, its performance is still affected by speaker mismatch between training and test data. Further finetuning a trained model with target speaker data is the most natural approach for adaptation, but it takes a lot of compute and may cause catastrophic forgetting to the existing speakers. In this work, we propose a unified speaker adaptation approach consisting of feature adaptation and model adaptation. For feature adaptation, we employ a speaker-aware persistent memory model which generalizes better to unseen test speakers by making use of speaker i-vectors to form a persistent memory. For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsTest · Pruning
