Learning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition
Gurunath Reddy Madhumani, Sanket Shah, Basil Abraham, Vikas Joshi,, Sunayana Sitaram

TL;DR
This paper introduces a task-agnostic training approach using domain adversarial learning to improve speech recognition accuracy for both monolingual and code-switched speech, addressing data scarcity and performance deterioration issues.
Contribution
It proposes a novel domain adversarial training method that creates shared representations for monolingual and code-switched speech recognition, enhancing performance across multiple language pairs.
Findings
Reductions in Word Error Rates for monolingual and code-switched speech
Shared layer parameters learned by adversarial discriminator are task-agnostic
Improved robustness of ASR systems across different language scenarios
Abstract
Recognizing code-switched speech is challenging for Automatic Speech Recognition (ASR) for a variety of reasons, including the lack of code-switched training data. Recently, we showed that monolingual ASR systems fine-tuned on code-switched data deteriorate in performance on monolingual speech recognition, which is not desirable as ASR systems deployed in multilingual scenarios should recognize both monolingual and code-switched speech with high accuracy. Our experiments indicated that this loss in performance could be mitigated by using certain strategies for fine-tuning and regularization, leading to improvements in both monolingual and code-switched ASR. In this work, we present further improvements over our previous work by using domain adversarial learning to train task agnostic models. We evaluate the classification accuracy of an adversarial discriminator and show that it can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Domain Adaptation and Few-Shot Learning
