BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to Real-Network Performance
Haotong Qin, Xudong Ma, Yifu Ding, Xiaoyang Li, Yang Zhang, Zejun Ma,, Jiakai Wang, Jie Luo, Xianglong Liu

TL;DR
BiFSMNv2 is an advanced binary neural network for keyword spotting that achieves near full-precision accuracy with significant speed and storage benefits on edge hardware through innovative architecture, training schemes, and hardware optimization.
Contribution
The paper introduces BiFSMNv2, a novel binary neural network with dual-scale activation, frequency-independent distillation, and a learning binarizer, enabling real-network performance on edge devices.
Findings
Outperforms existing binary KWS networks by significant margins.
Achieves only 1.51% accuracy drop compared to full-precision models.
Real-world deployment yields 25.1x speedup and 20.2x storage reduction.
Abstract
Deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications while suffering expensive computation and storage. Therefore, network compression technologies like binarization are studied to deploy KWS models on edge. In this paper, we present a strong yet efficient binary neural network for KWS, namely BiFSMNv2, pushing it to the real-network accuracy performance. First, we present a Dual-scale Thinnable 1-bit-Architecture to recover the representation capability of the binarized computation units by dual-scale activation binarization and liberate the speedup potential from an overall architecture perspective. Second, we also construct a Frequency Independent Distillation scheme for KWS binarization-aware training, which distills the high and low-frequency components independently to mitigate the information mismatch between full-precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
