Deep Feed-Forward Neural Network for Bangla Isolated Speech Recognition
Dipayan Bhadra, Mehrab Hosain, Fatema Alam

TL;DR
This paper presents a deep feed-forward neural network approach using MFCC features for isolated Bangla speech recognition, achieving over 93% accuracy and addressing the limited research in Bangla speech processing.
Contribution
It introduces a 7-layer DFFNN model combined with MFCC features for Bangla speech recognition, demonstrating improved accuracy over previous methods.
Findings
Achieved 93.42% recognition accuracy.
Demonstrated effectiveness of deep neural networks for Bangla speech.
Compared favorably with prior work in the field.
Abstract
As the most important human-machine interfacing tool, an insignificant amount of work has been carried out on Bangla Speech Recognition compared to the English language. Motivated by this, in this work, the performance of speaker-independent isolated speech recognition systems has been implemented and analyzed using a dataset that is created containing both isolated Bangla and English spoken words. An approach using the Mel Frequency Cepstral Coefficient (MFCC) and Deep Feed-Forward Fully Connected Neural Network (DFFNN) of 7 layers as a classifier is proposed in this work to recognize isolated spoken words. This work shows 93.42% recognition accuracy which is better compared to most of the works done previously on Bangla speech recognition considering the number of classes and dataset size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
