Multiple-Instance, Cascaded Classification for Keyword Spotting in   Narrow-Band Audio

Ahmad AbdulKader; Kareem Nassar; Mohamed El-Geish; Daniel Galvez,; Chetan Patil

arXiv:1711.08058·cs.LG·April 28, 2025·1 cites

Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio

Ahmad AbdulKader, Kareem Nassar, Mohamed El-Geish, Daniel Galvez,, Chetan Patil

PDF

Open Access

TL;DR

This paper introduces a cascaded deep neural network approach for keyword spotting in narrow-band 8kHz audio, effectively handling class imbalance and reducing power consumption in non-IID environments.

Contribution

The work presents a novel cascaded classifier system combining multiple features and multiple-instance learning for improved keyword spotting in challenging audio conditions.

Findings

01

False negative rate of 6% achieved

02

False positive rate of 0.75 per hour

03

Reduced power consumption via early termination

Abstract

We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments -- a more challenging task than most state-of-the-art KWS systems face. We present a model that incorporates Deep Neural Networks (DNNs), cascading, multiple-feature representations, and multiple-instance learning. The cascaded classifiers handle the task's class imbalance and reduce power consumption on computationally-constrained devices via early termination. The KWS system achieves a false negative rate of 6% at an hourly false positive rate of 0.75

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis