Keyword spotting using convolutional neural network for speech recognition in Hindi
Saru Bharti, Pushparaj Mani Pathak

TL;DR
This paper presents a CNN-based keyword spotting system for Hindi speech recognition, achieving over 91% accuracy with efficient on-device processing using MFCC features.
Contribution
It introduces a CNN approach tailored for Hindi KWS with feature engineering and evaluates various architectures for improved accuracy and efficiency.
Findings
Achieved 91.79% accuracy in keyword spotting.
Utilized MFCC features for effective CNN input.
Demonstrated suitability for on-device Hindi speech recognition.
Abstract
In this study, we investigate the application of keyword spotting (KWS) in the domain of Hindi speech recognition, utilizing a dataset comprising 40,000 audio samples. With a sampling rate of 44 kHz and an average duration of 1.9 seconds per sample, we focus on developing an efficient on-device KWS system tailored for user-specific queries. Leveraging Convolutional Neural Networks (CNNs) for classification, we employ feature engineering techniques to convert raw audio recordings into Mel Frequency Cepstral Coefficients (MFCCs) as an input for our network. Our experiments encompass various CNN architectures, exploring their efficacy in identifying predefined keywords within the continuous speech stream. Our CNN-based approach achieves a commendable accuracy rate of 91.79% through rigorous evaluation, demonstrating promising performance while ensuring computational efficiency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
