Learning not to Discriminate: Task Agnostic Learning for Improving   Monolingual and Code-switched Speech Recognition

Gurunath Reddy Madhumani; Sanket Shah; Basil Abraham; Vikas Joshi,; Sunayana Sitaram

arXiv:2006.05257·eess.AS·June 11, 2020·5 cites

Learning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition

Gurunath Reddy Madhumani, Sanket Shah, Basil Abraham, Vikas Joshi,, Sunayana Sitaram

PDF

Open Access

TL;DR

This paper introduces a task-agnostic training approach using domain adversarial learning to improve speech recognition accuracy for both monolingual and code-switched speech, addressing data scarcity and performance deterioration issues.

Contribution

It proposes a novel domain adversarial training method that creates shared representations for monolingual and code-switched speech recognition, enhancing performance across multiple language pairs.

Findings

01

Reductions in Word Error Rates for monolingual and code-switched speech

02

Shared layer parameters learned by adversarial discriminator are task-agnostic

03

Improved robustness of ASR systems across different language scenarios

Abstract

Recognizing code-switched speech is challenging for Automatic Speech Recognition (ASR) for a variety of reasons, including the lack of code-switched training data. Recently, we showed that monolingual ASR systems fine-tuned on code-switched data deteriorate in performance on monolingual speech recognition, which is not desirable as ASR systems deployed in multilingual scenarios should recognize both monolingual and code-switched speech with high accuracy. Our experiments indicated that this loss in performance could be mitigated by using certain strategies for fine-tuning and regularization, leading to improvements in both monolingual and code-switched ASR. In this work, we present further improvements over our previous work by using domain adversarial learning to train task agnostic models. We evaluate the classification accuracy of an adversarial discriminator and show that it can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Domain Adaptation and Few-Shot Learning