MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
Yu Xi, Haoyu Li, Xiaoyu Gu, Yidi Jiang, Kai Yu

TL;DR
MFA-KWS introduces a novel multi-head frame-asynchronous decoding framework for keyword spotting that improves accuracy and efficiency, outperforming traditional methods on multiple datasets and demonstrating robustness in noisy conditions.
Contribution
The paper proposes a new MFA-KWS framework combining CTC and Token-and-Duration Transducer with multi-head decoding, advancing keyword spotting performance and efficiency.
Findings
Achieves state-of-the-art results on multiple datasets.
Provides 47-63% speed-up over frame-synchronous baselines.
Demonstrates robustness in noisy environments.
Abstract
Keyword spotting (KWS) is essential for voice-driven applications, demanding both accuracy and efficiency. Traditional ASR-based KWS methods, such as greedy and beam search, explore the entire search space without explicitly prioritizing keyword detection, often leading to suboptimal performance. In this paper, we propose an effective keyword-specific KWS framework by introducing a streaming-oriented CTC-Transducer-combined frame-asynchronous system with multi-head frame-asynchronous decoding (MFA-KWS). Specifically, MFA-KWS employs keyword-specific phone-synchronous decoding for CTC and replaces conventional RNN-T with Token-and-Duration Transducer to enhance both performance and efficiency. Furthermore, we explore various score fusion strategies, including single-frame-based and consistency-based methods. Extensive experiments demonstrate the superior performance of MFA-KWS, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Information Retrieval and Search Behavior · Text and Document Classification Technologies
