Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization
Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia,, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

TL;DR
This paper introduces a neural network framework that jointly models monolingual and code-switched speech recognition, improving bilingual ASR by leveraging conditional factorization to handle different utterance types effectively.
Contribution
It proposes a novel conditionally factorized joint modeling approach for bilingual ASR that unifies monolingual and code-switched speech recognition within a single neural network.
Findings
Effective on Mandarin-English bilingual speech data
Improves recognition accuracy for code-switched utterances
Unified framework handles monolingual and code-switched speech
Abstract
Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. By defining the monolingual sub-tasks with label-to-frame synchronization, our joint modeling framework can be conditionally factorized such that the final bilingual output, which may or may not be code-switched, is obtained given only monolingual information. We show that this conditionally factorized joint framework can be modeled by an end-to-end differentiable neural network. We demonstrate the efficacy of our proposed model on bilingual Mandarin-English speech recognition across both monolingual and code-switched corpora.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Phonetics and Phonology Research
