MADI: Inter-domain Matching and Intra-domain Discrimination for   Cross-domain Speech Recognition

Jiaming Zhou; Shiwan Zhao; Ning Jiang; Guoqing Zhao; Yong Qin

arXiv:2302.11224·cs.CL·February 23, 2023

MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition

Jiaming Zhou, Shiwan Zhao, Ning Jiang, Guoqing Zhao, Yong Qin

PDF

Open Access

TL;DR

This paper introduces MADI, a novel unsupervised domain adaptation method for speech recognition that enhances transferability and discriminability, significantly reducing word error rates across different domains.

Contribution

MADI combines inter-domain matching with intra-domain discrimination to improve cross-domain speech recognition performance.

Findings

01

Reduces WER by 17.7% on cross-device tasks.

02

Reduces WER by 22.8% on cross-environment tasks.

03

Effective on Libri-Adapt dataset.

Abstract

End-to-end automatic speech recognition (ASR) usually suffers from performance degradation when applied to a new domain due to domain shift. Unsupervised domain adaptation (UDA) aims to improve the performance on the unlabeled target domain by transferring knowledge from the source to the target domain. To improve transferability, existing UDA approaches mainly focus on matching the distributions of the source and target domains globally and/or locally, while ignoring the model discriminability. In this paper, we propose a novel UDA approach for ASR via inter-domain MAtching and intra-domain DIscrimination (MADI), which improves the model transferability by fine-grained inter-domain matching and discriminability by intra-domain contrastive discrimination simultaneously. Evaluations on the Libri-Adapt dataset demonstrate the effectiveness of our approach. MADI reduces the relative word…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing