A Unified Speaker Adaptation Approach for ASR

Yingzhu Zhao; Chongjia Ni; Cheung-Chi Leung; Shafiq Joty; Eng Siong; Chng; Bin Ma

arXiv:2110.08545·eess.AS·October 19, 2021

A Unified Speaker Adaptation Approach for ASR

Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq Joty, Eng Siong, Chng, Bin Ma

PDF

Open Access 1 Repo

TL;DR

This paper introduces a unified speaker adaptation method for ASR that combines feature and model adaptation, using speaker-aware memory and gradual pruning, achieving significant WER improvements on Librispeech.

Contribution

It proposes a novel combination of feature adaptation with speaker-aware memory and a gradual pruning-based model adaptation, avoiding architecture changes and reducing computational costs.

Findings

01

Achieves 2.74-6.52% relative WER reduction in general speaker adaptation.

02

Outperforms baseline and finetuning methods in target speaker adaptation.

03

Effective even with extremely low-resource data, improving WER with minimal training.

Abstract

Transformer models have been used in automatic speech recognition (ASR) successfully and yields state-of-the-art results. However, its performance is still affected by speaker mismatch between training and test data. Further finetuning a trained model with target speaker data is the most natural approach for adaptation, but it takes a lot of compute and may cause catastrophic forgetting to the existing speakers. In this work, we propose a unified speaker adaptation approach consisting of feature adaptation and model adaptation. For feature adaptation, we employ a speaker-aware persistent memory model which generalizes better to unseen test speakers by making use of speaker i-vectors to form a persistent memory. For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zyzpower/gradprune_speaker
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsTest · Pruning