Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Md Sazzadul Islam Ridoy; Sumi Akter; Md. Aminur Rahman

arXiv:2507.01931·cs.CL·July 3, 2025

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Md Sazzadul Islam Ridoy, Sumi Akter, Md. Aminur Rahman

PDF

Open Access 1 Models

TL;DR

This study compares Whisper and Wav2Vec-BERT ASR models on Bangla, revealing Wav2Vec-BERT's superior performance and efficiency in low-resource language settings.

Contribution

It provides a systematic comparison of two leading ASR models on Bangla, highlighting Wav2Vec-BERT's advantages in accuracy and resource utilization.

Findings

01

Wav2Vec-BERT outperforms Whisper in WER and CER

02

Wav2Vec-BERT requires fewer computational resources

03

Systematic hyperparameter tuning improves model performance

Abstract

In recent years, neural models trained on large multilingual text and speech datasets have shown great potential for supporting low-resource languages. This study investigates the performances of two state-of-the-art Automatic Speech Recognition (ASR) models, OpenAI's Whisper (Small & Large-V2) and Facebook's Wav2Vec-BERT on Bangla, a low-resource language. We have conducted experiments using two publicly available datasets: Mozilla Common Voice-17 and OpenSLR to evaluate model performances. Through systematic fine-tuning and hyperparameter optimization, including learning rate, epochs, and model checkpoint selection, we have compared the models based on Word Error Rate (WER), Character Error Rate (CER), Training Time, and Computational Efficiency. The Wav2Vec-BERT model outperformed Whisper across all key evaluation metrics, demonstrated superior performance while requiring fewer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
sazzadul/Shrutimala_Bangla_ASR
model· 167 dl· ♡ 1
167 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · ICT in Developing Communities · Natural Language Processing Techniques