Robust Persian Digit Recognition in Noisy Environments Using Hybrid   CNN-BiGRU Model

Ali Nasr-Esfahani; Mehdi Bekrani; Roozbeh Rajabi

arXiv:2412.10857·cs.SD·February 12, 2025

Robust Persian Digit Recognition in Noisy Environments Using Hybrid CNN-BiGRU Model

Ali Nasr-Esfahani, Mehdi Bekrani, Roozbeh Rajabi

PDF

Open Access

TL;DR

This paper presents a hybrid CNN-BiGRU model for recognizing Persian digits in noisy environments, achieving high accuracy and outperforming existing methods by significant margins.

Contribution

It introduces a novel hybrid CNN-BiGRU approach using word units for noise-robust Persian digit recognition, outperforming phoneme-based models.

Findings

01

Achieved over 98% accuracy on clean data

02

Improved recognition by 26.88% in noisy conditions

03

Outperformed phoneme-based LSTM and MTDRCC+MLP models

Abstract

Artificial intelligence (AI) has significantly advanced speech recognition applications. However, many existing neural network-based methods struggle with noise, reducing accuracy in real-world environments. This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions, particularly for phonetically similar numbers. A hybrid model combining residual convolutional neural networks and bidirectional gated recurrent units (BiGRU) is proposed, utilizing word units instead of phoneme units for speaker-independent recognition. The FARSDIGIT1 dataset, augmented with various approaches, is processed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction. Experimental results demonstrate the model's effectiveness, achieving 98.53%, 96.10%, and 95.92% accuracy on training, validation, and test sets, respectively. In noisy conditions, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory