High Precision Speech Keyword Spotting Based on Binary Deep Neural Network in FPGA
Ang Zhang, Jialiang Shi, Hui Qian, Junjie Wang

TL;DR
This paper introduces a new binary neural network model for speech keyword spotting that improves accuracy while using fewer resources on IoT devices.
Contribution
A novel Probability Smoothing Enhanced Binarized Neural Network (PSE-BNN) is proposed to balance accuracy and computational efficiency for FPGA deployment.
Findings
PSE-BNN achieves 97.29% accuracy on the Google Speech Commands Dataset.
The model uses 65% fewer hardware resources compared to state-of-the-art BNN-KWS designs.
The smoothing filter reduces noise-induced entropy and improves signal-to-noise ratio.
Abstract
Deep Neural Networks (DNNs) are the primary approach for enhancing the real-time performance and accuracy of Keyword Spotting (KWS) systems in speech processing. However, the exceptional performance of DNN-KWS faces significant challenges related to computational intensity and storage requirements, severely limiting its deployment on resource-constrained Internet of Things (IoT) edge devices. Researchers have sought to mitigate these demands by employing Binary Neural Networks (BNNs) through single-bit quantization, albeit at the cost of reduced recognition accuracy. From an information-theoretic perspective, binarization, as a form of lossy compression, increases the uncertainty (Shannon entropy) in the model’s output, contributing to the accuracy degradation. Unfortunately, even a slight accuracy degradation can trigger frequent false wake-ups in the KWS module, leading to substantial…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Wireless Signal Modulation Classification
