Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual   Loss

Mohammad Soleymanpour; Mahmoud Al Ismail; Fahimeh Bahmaninezhad,; Kshitiz Kumar; Jian Wu

arXiv:2308.06327·eess.AS·August 15, 2023

Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

Mohammad Soleymanpour, Mahmoud Al Ismail, Fahimeh Bahmaninezhad,, Kshitiz Kumar, Jian Wu

PDF

Open Access

TL;DR

This paper presents a novel bilingual streaming ASR model using grapheme units and auxiliary monolingual loss, significantly improving code-mixing recognition and monolingual performance in Spanish and Italian tasks.

Contribution

It introduces a fully bilingual alignment and streaming transformer model with a parallel encoder and auxiliary monolingual loss, enhancing bilingual ASR performance.

Findings

01

Bilingual models achieve strong code-mixing recognition.

02

Auxiliary monolingual loss outperforms LID loss in encoder specialization.

03

Italian bilingual model reduces WER from 46.5% to 13.8%.

Abstract

We introduce a bilingual solution to support English as secondary locale for most primary locales in hybrid automatic speech recognition (ASR) settings. Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) a fully bilingual alignment model and subsequently bilingual streaming transformer model, (c) a parallel encoder structure with language identification (LID) loss, (d) parallel encoder with an auxiliary loss for monolingual projections. We conclude that in comparison to LID loss, our proposed auxiliary loss is superior in specializing the parallel encoders to respective monolingual locales, and that contributes to stronger bilingual learning. We evaluate our work on large-scale training and test tasks for bilingual Spanish (ES) and bilingual Italian (IT) applications. Our bilingual models demonstrate strong English code-mixing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research