Applying wav2vec2 for Speech Recognition on Bengali Common Voices   Dataset

H.A.Z. Sameen Shahgir; Khondker Salman Sayeed; Tanjeem Azwad Zaman

arXiv:2209.06581·eess.AS·September 15, 2022·5 cites

Applying wav2vec2 for Speech Recognition on Bengali Common Voices Dataset

H.A.Z. Sameen Shahgir, Khondker Salman Sayeed, Tanjeem Azwad Zaman

PDF

Open Access

TL;DR

This paper fine-tunes wav2vec 2.0 for Bengali speech recognition using the Common Voice dataset, achieving improved accuracy and outperforming other models on a hidden test set.

Contribution

It presents the first application of wav2vec 2.0 to Bengali speech recognition with detailed training and evaluation results.

Findings

01

Achieved a WER of 25.24% on validation set.

02

Reduced Levenshtein Distance to 2.607 on test set after additional training.

03

Outperformed competing models with a Levenshtein Distance of 6.234 on hidden data.

Abstract

Speech is inherently continuous, where discrete words, phonemes and other units are not clearly segmented, and so speech recognition has been an active research problem for decades. In this work we have fine-tuned wav2vec 2.0 to recognize and transcribe Bengali speech -- training it on the Bengali Common Voice Speech Dataset. After training for 71 epochs, on a training set consisting of 36919 mp3 files, we achieved a training loss of 0.3172 and WER of 0.2524 on a validation set of size 7,747. Using a 5-gram language model, the Levenshtein Distance was 2.6446 on a test set of size 7,747. Then the training set and validation set were combined, shuffled and split into 85-15 ratio. Training for 7 more epochs on this combined dataset yielded an improved Levenshtein Distance of 2.60753 on the test set. Our model was the best performing one, achieving a Levenshtein Distance of 6.234 on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques

MethodsTest