Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss
Mohammad Soleymanpour, Mahmoud Al Ismail, Fahimeh Bahmaninezhad,, Kshitiz Kumar, Jian Wu

TL;DR
This paper presents a novel bilingual streaming ASR model using grapheme units and auxiliary monolingual loss, significantly improving code-mixing recognition and monolingual performance in Spanish and Italian tasks.
Contribution
It introduces a fully bilingual alignment and streaming transformer model with a parallel encoder and auxiliary monolingual loss, enhancing bilingual ASR performance.
Findings
Bilingual models achieve strong code-mixing recognition.
Auxiliary monolingual loss outperforms LID loss in encoder specialization.
Italian bilingual model reduces WER from 46.5% to 13.8%.
Abstract
We introduce a bilingual solution to support English as secondary locale for most primary locales in hybrid automatic speech recognition (ASR) settings. Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) a fully bilingual alignment model and subsequently bilingual streaming transformer model, (c) a parallel encoder structure with language identification (LID) loss, (d) parallel encoder with an auxiliary loss for monolingual projections. We conclude that in comparison to LID loss, our proposed auxiliary loss is superior in specializing the parallel encoders to respective monolingual locales, and that contributes to stronger bilingual learning. We evaluate our work on large-scale training and test tasks for bilingual Spanish (ES) and bilingual Italian (IT) applications. Our bilingual models demonstrate strong English code-mixing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
