Domain Adaptation Using Class Similarity for Robust Speech Recognition

Han Zhu; Jiangjiang Zhao; Yuling Ren; Li Wang; Pengyuan Zhang

arXiv:2011.02782·eess.AS·November 6, 2020

Domain Adaptation Using Class Similarity for Robust Speech Recognition

Han Zhu, Jiangjiang Zhao, Yuling Ren, Li Wang, Pengyuan Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel domain adaptation method for speech recognition that leverages class similarity through mean soft labels, improving performance under domain mismatch and data scarcity.

Contribution

The proposed approach uses class similarity via mean soft labels to enhance DNN acoustic model adaptation, outperforming traditional fine-tuning methods.

Findings

01

Outperforms fine-tuning with one-hot labels in accent adaptation.

02

Effective in noise and accent mismatch scenarios.

03

Improves robustness of speech recognition models.

Abstract

When only limited target domain data is available, domain adaptation could be used to promote performance of deep neural network (DNN) acoustic model by leveraging well-trained source model and target domain data. However, suffering from domain mismatch and data sparsity, domain adaptation is very challenging. This paper proposes a novel adaptation method for DNN acoustic model using class similarity. Since the output distribution of DNN model contains the knowledge of similarity among classes, which is applicable to both source and target domain, it could be transferred from source to target model for the performance improvement. In our approach, we first compute the frame level posterior probabilities of source samples using source model. Then, for each class, probabilities of this class are used to compute a mean vector, which we refer to as mean soft labels. During adaptation, these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhu-han/ASR-Adaption-Class-Similarity
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing