QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using   MLPMixer

Jinmiao Huang; Waseem Gharbieh; Qianhui Wan; Han Suk Shim; Chul Lee

arXiv:2206.13231·eess.AS·June 28, 2022

QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer

Jinmiao Huang, Waseem Gharbieh, Qianhui Wan, Han Suk Shim, Chul Lee

PDF

Open Access

TL;DR

This paper introduces a pure MLP-based neural network architecture, MLPMixer, for open-vocabulary keyword spotting, outperforming RNN and CNN models in challenging acoustic environments while using fewer parameters.

Contribution

The paper adapts MLPMixer architecture for open-vocabulary keyword spotting, demonstrating superior performance and efficiency over existing RNN and CNN models.

Findings

01

Outperforms RNN and CNN models in 10dB and 6dB environments.

02

Achieves better accuracy on Hey-Snips and internal datasets.

03

Uses fewer parameters and MACs than baseline models.

Abstract

Current keyword spotting systems are typically trained with a large amount of pre-defined keywords. Recognizing keywords in an open-vocabulary setting is essential for personalizing smart device interaction. Towards this goal, we propose a pure MLP-based neural network that is based on MLPMixer - an MLP model architecture that effectively replaces the attention mechanism in Vision Transformers. We investigate different ways of adapting the MLPMixer architecture to the QbyE open-vocabulary keyword spotting task. Comparisons with the state-of-the-art RNN and CNN models show that our method achieves better performance in challenging situations (10dB and 6dB environments) on both the publicly available Hey-Snips dataset and a larger scale internal dataset with 400 speakers. Our proposed model also has a smaller number of parameters and MACs compared to the baseline models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Topic Modeling · Multimodal Machine Learning Applications