Keyword spotting using convolutional neural network for speech recognition in Hindi

Saru Bharti; Pushparaj Mani Pathak

arXiv:2605.02928·cs.SD·May 6, 2026

Keyword spotting using convolutional neural network for speech recognition in Hindi

Saru Bharti, Pushparaj Mani Pathak

PDF

TL;DR

This paper presents a CNN-based keyword spotting system for Hindi speech recognition, achieving over 91% accuracy with efficient on-device processing using MFCC features.

Contribution

It introduces a CNN approach tailored for Hindi KWS with feature engineering and evaluates various architectures for improved accuracy and efficiency.

Findings

01

Achieved 91.79% accuracy in keyword spotting.

02

Utilized MFCC features for effective CNN input.

03

Demonstrated suitability for on-device Hindi speech recognition.

Abstract

In this study, we investigate the application of keyword spotting (KWS) in the domain of Hindi speech recognition, utilizing a dataset comprising 40,000 audio samples. With a sampling rate of 44 kHz and an average duration of 1.9 seconds per sample, we focus on developing an efficient on-device KWS system tailored for user-specific queries. Leveraging Convolutional Neural Networks (CNNs) for classification, we employ feature engineering techniques to convert raw audio recordings into Mel Frequency Cepstral Coefficients (MFCCs) as an input for our network. Our experiments encompass various CNN architectures, exploring their efficacy in identifying predefined keywords within the continuous speech stream. Our CNN-based approach achieves a commendable accuracy rate of 91.79% through rigorous evaluation, demonstrating promising performance while ensuring computational efficiency and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.