Dual Script E2E framework for Multilingual and Code-Switching ASR

Mari Ganesh Kumar; Jom Kuriakose; Anand Thyagachandran; Arun Kumar A,; Ashish Seth; Lodagala Durga Prasad; Saish Jaiswal; Anusha Prakash; Hema; Murthy

arXiv:2106.01400·eess.AS·June 4, 2021

Dual Script E2E framework for Multilingual and Code-Switching ASR

Mari Ganesh Kumar, Jom Kuriakose, Anand Thyagachandran, Arun Kumar A,, Ashish Seth, Lodagala Durga Prasad, Saish Jaiswal, Anusha Prakash, Hema, Murthy

PDF

TL;DR

This paper introduces a dual script end-to-end framework for multilingual and code-switching ASR in Indian languages, leveraging a rule-based phoneme-level label set and novel back-end techniques to improve recognition accuracy.

Contribution

It proposes two innovative E2E ASR systems using a common label set and native script recovery, advancing multilingual and code-switching speech recognition for Indian languages.

Findings

01

Achieved approximately 6% WER reduction in multilingual ASR

02

Achieved approximately 5% WER reduction in code-switching ASR

03

Demonstrated effectiveness on the Indic ASR Challenge 2021

Abstract

India is home to multiple languages, and training automatic speech recognition (ASR) systems for languages is challenging. Over time, each language has adopted words from other languages, such as English, leading to code-mixing. Most Indian languages also have their own unique scripts, which poses a major limitation in training multilingual and code-switching ASR systems. Inspired by results in text-to-speech synthesis, in this work, we use an in-house rule-based phoneme-level common label set (CLS) representation to train multilingual and code-switching ASR for Indian languages. We propose two end-to-end (E2E) ASR systems. In the first system, the E2E model is trained on the CLS representation, and we use a novel data-driven back-end to recover the native language script. In the second system, we propose a modification to the E2E model, wherein the CLS representation and the native…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.