Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing   N-gram Language Models

Mohammed Rakib; Md. Ismail Hossain; Nabeel Mohammed; Fuad Rahman

arXiv:2209.12650·cs.CL·September 27, 2022

Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models

Mohammed Rakib, Md. Ismail Hossain, Nabeel Mohammed, Fuad Rahman

PDF

Open Access

TL;DR

This paper enhances Bangla speech-to-text transcription by fine-tuning wav2vec2 models with the Bengali Common Voice dataset and integrating n-gram language models, achieving superior performance over existing models.

Contribution

It introduces a novel approach combining pretrained wav2vec2 fine-tuning with n-gram language models for improved Bangla ASR performance.

Findings

01

Outperforms state-of-the-art Bengali ASR models

02

Significant accuracy improvement with n-gram language models

03

Robust Bangla ASR model through hyperparameter tuning

Abstract

Although over 300M around the world speak Bangla, scant work has been done in improving Bangla voice-to-text transcription due to Bangla being a low-resource language. However, with the introduction of the Bengali Common Voice 9.0 speech dataset, Automatic Speech Recognition (ASR) models can now be significantly improved. With 399hrs of speech recordings, Bengali Common Voice is the largest and most diversified open-source Bengali speech corpus in the world. In this paper, we outperform the SOTA pretrained Bengali ASR models by finetuning a pretrained wav2vec2 model on the common voice dataset. We also demonstrate how to significantly improve the performance of an ASR model by adding an n-gram language model as a post-processor. Finally, we do some experiments and hyperparameter tuning to generate a robust Bangla ASR model that is better than the existing ASR models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing