Cross-domain Speech Recognition with Unsupervised Character-level   Distribution Matching

Wenxin Hou; Jindong Wang; Xu Tan; Tao Qin; Takahiro Shinozaki

arXiv:2104.07491·cs.SD·June 10, 2021

Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching

Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki

PDF

Open Access 1 Repo

TL;DR

This paper introduces CMatch, a novel unsupervised character-level distribution matching method for domain adaptation in speech recognition, significantly reducing word error rates across different devices and environments.

Contribution

It proposes a new fine-grained domain adaptation technique using character-level distribution matching with pseudo labels and self-training, improving ASR performance.

Findings

01

Achieves 14.39% and 16.50% relative WER reduction on Libri-Adapt.

02

Effectively matches character distributions across domains.

03

Analyzes strategies for label assignment and model adaptation.

Abstract

End-to-end automatic speech recognition (ASR) can achieve promising performance with large-scale training data. However, it is known that domain mismatch between training and testing data often leads to a degradation of recognition accuracy. In this work, we focus on the unsupervised domain adaptation for ASR and propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains. First, to obtain labels for the features belonging to each character, we achieve frame-level label assignment using the Connectionist Temporal Classification (CTC) pseudo labels. Then, we match the character-level distributions using Maximum Mean Discrepancy. We train our algorithm using the self-training technique. Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jindongwang/transferlearning/tree/master/code/ASR/CMatch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Dropout · Adam · Layer Normalization · Label Smoothing · Byte Pair Encoding