A Dual-Decoder Conformer for Multilingual Speech Recognition

Krishna D N

arXiv:2109.03277·cs.CL·September 9, 2021·1 cites

A Dual-Decoder Conformer for Multilingual Speech Recognition

Krishna D N

PDF

Open Access

TL;DR

This paper introduces a dual-decoder Conformer model for multilingual speech recognition in low-resource Indian languages, leveraging multi-task learning to improve accuracy over traditional single-decoder models.

Contribution

It presents a novel dual-decoder architecture with phoneme and grapheme decoders, jointly trained with language classification for enhanced multilingual speech recognition.

Findings

01

Significant reduction in WER compared to baseline models

02

Dual-decoder approach outperforms single-decoder models

03

Effective multi-task learning improves recognition accuracy

Abstract

Transformer-based models have recently become very popular for sequence-to-sequence applications such as machine translation and speech recognition. This work proposes a dual-decoder transformer model for low-resource multilingual speech recognition for Indian languages. Our proposed model consists of a Conformer [1] encoder, two parallel transformer decoders, and a language classifier. We use a phoneme decoder (PHN-DEC) for the phoneme recognition task and a grapheme decoder (GRP-DEC) to predict grapheme sequence along with language information. We consider phoneme recognition and language identification as auxiliary tasks in the multi-task learning framework. We jointly optimize the network for phoneme recognition, grapheme recognition, and language identification tasks with Joint CTC-Attention [2] training. Our experiments show that we can obtain a significant reduction in WER over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing