Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio
Ahmad AbdulKader, Kareem Nassar, Mohamed El-Geish, Daniel Galvez,, Chetan Patil

TL;DR
This paper introduces a cascaded deep neural network approach for keyword spotting in narrow-band 8kHz audio, effectively handling class imbalance and reducing power consumption in non-IID environments.
Contribution
The work presents a novel cascaded classifier system combining multiple features and multiple-instance learning for improved keyword spotting in challenging audio conditions.
Findings
False negative rate of 6% achieved
False positive rate of 0.75 per hour
Reduced power consumption via early termination
Abstract
We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments -- a more challenging task than most state-of-the-art KWS systems face. We present a model that incorporates Deep Neural Networks (DNNs), cascading, multiple-feature representations, and multiple-instance learning. The cascaded classifiers handle the task's class imbalance and reduce power consumption on computationally-constrained devices via early termination. The KWS system achieves a false negative rate of 6% at an hourly false positive rate of 0.75
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
