An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning
Tushar Talukder Showrav

TL;DR
This paper presents a Bengali speech recognition system using Wav2Vec2 and transfer learning, achieving promising results with limited training data in a low-resource language context.
Contribution
It introduces a transfer learning-based end-to-end speech recognition approach tailored for Bengali, addressing data scarcity issues.
Findings
Achieved a Levenshtein Mean Distance score of 3.819 on test data
Effectively modeled Bengali speech with only 1000 training samples
Demonstrated potential for low-resource language ASR development
Abstract
An independent, automated method of decoding and transcribing oral speech is known as automatic speech recognition (ASR). A typical ASR system extracts feature from audio recordings or streams and run one or more algorithms to map the features to corresponding texts. Numerous of research has been done in the field of speech signal processing in recent years. When given adequate resources, both conventional ASR and emerging end-to-end (E2E) speech recognition have produced promising results. However, for low-resource languages like Bengali, the current state of ASR lags behind, although the low resource state does not reflect upon the fact that this language is spoken by over 500 million people all over the world. Despite its popularity, there aren't many diverse open-source datasets available, which makes it difficult to conduct research on Bengali speech recognition systems. This paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsTest
