Robust Persian Digit Recognition in Noisy Environments Using Hybrid CNN-BiGRU Model
Ali Nasr-Esfahani, Mehdi Bekrani, Roozbeh Rajabi

TL;DR
This paper presents a hybrid CNN-BiGRU model for recognizing Persian digits in noisy environments, achieving high accuracy and outperforming existing methods by significant margins.
Contribution
It introduces a novel hybrid CNN-BiGRU approach using word units for noise-robust Persian digit recognition, outperforming phoneme-based models.
Findings
Achieved over 98% accuracy on clean data
Improved recognition by 26.88% in noisy conditions
Outperformed phoneme-based LSTM and MTDRCC+MLP models
Abstract
Artificial intelligence (AI) has significantly advanced speech recognition applications. However, many existing neural network-based methods struggle with noise, reducing accuracy in real-world environments. This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions, particularly for phonetically similar numbers. A hybrid model combining residual convolutional neural networks and bidirectional gated recurrent units (BiGRU) is proposed, utilizing word units instead of phoneme units for speaker-independent recognition. The FARSDIGIT1 dataset, augmented with various approaches, is processed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction. Experimental results demonstrate the model's effectiveness, achieving 98.53%, 96.10%, and 95.92% accuracy on training, validation, and test sets, respectively. In noisy conditions, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
