High Precision Speech Keyword Spotting Based on Binary Deep Neural Network in FPGA

Ang Zhang; Jialiang Shi; Hui Qian; Junjie Wang

PMC · DOI:10.3390/e27111143·November 7, 2025

High Precision Speech Keyword Spotting Based on Binary Deep Neural Network in FPGA

Ang Zhang, Jialiang Shi, Hui Qian, Junjie Wang

PDF

Open Access

TL;DR

This paper introduces a new binary neural network model for speech keyword spotting that improves accuracy while using fewer resources on IoT devices.

Contribution

A novel Probability Smoothing Enhanced Binarized Neural Network (PSE-BNN) is proposed to balance accuracy and computational efficiency for FPGA deployment.

Findings

01

PSE-BNN achieves 97.29% accuracy on the Google Speech Commands Dataset.

02

The model uses 65% fewer hardware resources compared to state-of-the-art BNN-KWS designs.

03

The smoothing filter reduces noise-induced entropy and improves signal-to-noise ratio.

Abstract

Deep Neural Networks (DNNs) are the primary approach for enhancing the real-time performance and accuracy of Keyword Spotting (KWS) systems in speech processing. However, the exceptional performance of DNN-KWS faces significant challenges related to computational intensity and storage requirements, severely limiting its deployment on resource-constrained Internet of Things (IoT) edge devices. Researchers have sought to mitigate these demands by employing Binary Neural Networks (BNNs) through single-bit quantization, albeit at the cost of reduced recognition accuracy. From an information-theoretic perspective, binarization, as a form of lossy compression, increases the uncertainty (Shannon entropy) in the model’s output, contributing to the accuracy degradation. Unfortunately, even a slight accuracy degradation can trigger frequent false wake-ups in the KWS module, leading to substantial…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals2

BNN RAM

Diseases2

injury to KWS

Figures9

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Wireless Signal Modulation Classification