DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common   Label Set

Arunkumar A; Mudit Batra; Umesh S

arXiv:2210.16739·eess.AS·November 1, 2022

DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common Label Set

Arunkumar A, Mudit Batra, Umesh S

PDF

Open Access

TL;DR

This paper introduces DuDe, a novel dual-decoder architecture for multilingual Indian language ASR that leverages a common label set and machine transliteration to improve recognition across diverse scripts and sounds.

Contribution

It proposes a new Encoder-Decoder-Decoder architecture utilizing common label sets and native scripts, enhancing multilingual ASR for Indian languages.

Findings

01

CLS-based models improve recognition accuracy

02

Dual-decoder architecture outperforms single-decoder models

03

Machine transliteration enhances multilingual system performance

Abstract

In a multilingual country like India, multilingual Automatic Speech Recognition (ASR) systems have much scope. Multilingual ASR systems exhibit many advantages like scalability, maintainability, and improved performance over the monolingual ASR systems. However, building multilingual systems for Indian languages is challenging since different languages use different scripts for writing. On the other hand, Indian languages share a lot of common sounds. Common Label Set (CLS) exploits this idea and maps graphemes of various languages with similar sounds to common labels. Since Indian languages are mostly phonetic, building a parser to convert from native script to CLS is easy. In this paper, we explore various approaches to build multilingual ASR models. We also propose a novel architecture called Encoder-Decoder-Decoder for building multilingual systems that use both CLS and native…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing