Text-aware Speech Separation for Multi-talker Keyword Spotting

Haoyu Li; Baochen Yang; Yu Xi; Linfeng Yu; Tian Tan; Hao Li; Kai Yu

arXiv:2406.12447·eess.AS·June 19, 2024·Interspeech

Text-aware Speech Separation for Multi-talker Keyword Spotting

Haoyu Li, Baochen Yang, Yu Xi, Linfeng Yu, Tian Tan, Hao Li, Kai Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel text-aware training method for multi-talker keyword spotting that leverages text clues to improve speech separation and detection accuracy in noisy, multi-speaker environments.

Contribution

It proposes the TPDT-SS method, integrating keyword-specific clues into speech separation models to enhance multi-talker KWS performance, addressing permutation issues effectively.

Findings

01

Significant improvement in multi-talker KWS accuracy.

02

Effective handling of permutation problems in mixed speech.

03

Enhanced performance after fine-tuning on unseen data.

Abstract

For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail party problem where multi-talker speech is separated using speaker clues, the key challenge here is to extract the target speech for KWS based on text clues. To address it, this paper proposes a novel Text-aware Permutation Determinization Training method for multi-talker KWS with a clue-based Speech Separation front-end (TPDT-SS). Our research highlights the critical role of SS front-ends and shows that incorporating keyword-specific clues into these models can greatly enhance the effectiveness. TPDT-SS shows remarkable success in addressing permutation problems in mixed keyword speech, thereby greatly boosting the performance of the backend.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gnafiy/tpdt-ss-kws
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing

MethodsSoftmax · Attention Is All You Need