Best of Both Worlds: Robust Accented Speech Recognition with Adversarial   Transfer Learning

Nilaksh Das; Sravan Bodapati; Monica Sunkara; Sundararajan Srinivasan,; Duen Horng Chau

arXiv:2103.05834·eess.AS·March 11, 2021·Interspeech

Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning

Nilaksh Das, Sravan Bodapati, Monica Sunkara, Sundararajan Srinivasan,, Duen Horng Chau

PDF

TL;DR

This paper introduces Accent Pre-Training, a semi-supervised transfer learning method using adversarial training to improve accented speech recognition with limited annotated data.

Contribution

It proposes a novel semi-supervised approach combining transfer learning and adversarial training for robust accented speech recognition.

Findings

01

Achieves 33% average improvement over baseline across multiple accents.

02

Effective with as little as 105 minutes of unannotated accented speech.

03

Enhances ASR performance using only one standard accent's annotated data.

Abstract

Training deep neural networks for automatic speech recognition (ASR) requires large amounts of transcribed speech. This becomes a bottleneck for training robust models for accented speech which typically contains high variability in pronunciation and other semantics, since obtaining large amounts of annotated accented data is both tedious and costly. Often, we only have access to large amounts of unannotated speech from different accents. In this work, we leverage this unannotated data to provide semantic regularization to an ASR model that has been trained only on one accent, to improve its performance for multiple accents. We propose Accent Pre-Training (Acc-PT), a semi-supervised training strategy that combines transfer learning and adversarial training. Our approach improves the performance of a state-of-the-art ASR model by 33% on average over the baseline across multiple accents,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.