BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization
Md. Nazmus Sadat Samin, Jawad Ibn Ahad, Tanjila Ahmed Medha, Fuad, Rahman, Mohammad Ruhul Amin, Nabeel Mohammed, Shafin Rahman

TL;DR
This paper introduces BanglaDialecto, an end-to-end AI system that recognizes Bangladeshi dialects and standardizes them into formal Bengali speech, leveraging a large dataset and fine-tuned multilingual models.
Contribution
It presents a novel dataset and pipeline for dialect recognition and standardization of Bangla speech using fine-tuned multilingual models, addressing a low-resource language challenge.
Findings
Achieved 0.8% CER and 1.5% WER with Whisper ASR model.
Attained a BLEU score of 41.6% with BanglaT5 for dialect translation.
Demonstrated effective dialect standardization for inclusive communication.
Abstract
This study focuses on recognizing Bangladeshi dialects and converting diverse Bengali accents into standardized formal Bengali speech. Dialects, often referred to as regional languages, are distinctive variations of a language spoken in a particular location and are identified by their phonetics, pronunciations, and lexicon. Subtle changes in pronunciation and intonation are also influenced by geographic location, educational attainment, and socioeconomic status. Dialect standardization is needed to ensure effective communication, educational consistency, access to technology, economic opportunities, and the preservation of linguistic resources while respecting cultural diversity. Being the fifth most spoken language with around 55 distinct dialects spoken by 160 million people, addressing Bangla dialects is crucial for developing inclusive communication tools. However, limited research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
