Joint Modeling of Code-Switched and Monolingual ASR via Conditional   Factorization

Brian Yan; Chunlei Zhang; Meng Yu; Shi-Xiong Zhang; Siddharth Dalmia,; Dan Berrebbi; Chao Weng; Shinji Watanabe; Dong Yu

arXiv:2111.15016·cs.CL·December 1, 2021

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia,, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

PDF

Open Access

TL;DR

This paper introduces a neural network framework that jointly models monolingual and code-switched speech recognition, improving bilingual ASR by leveraging conditional factorization to handle different utterance types effectively.

Contribution

It proposes a novel conditionally factorized joint modeling approach for bilingual ASR that unifies monolingual and code-switched speech recognition within a single neural network.

Findings

01

Effective on Mandarin-English bilingual speech data

02

Improves recognition accuracy for code-switched utterances

03

Unified framework handles monolingual and code-switched speech

Abstract

Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. By defining the monolingual sub-tasks with label-to-frame synchronization, our joint modeling framework can be conditionally factorized such that the final bilingual output, which may or may not be code-switched, is obtained given only monolingual information. We show that this conditionally factorized joint framework can be modeled by an end-to-end differentiable neural network. We demonstrate the efficacy of our proposed model on bilingual Mandarin-English speech recognition across both monolingual and code-switched corpora.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Phonetics and Phonology Research