Multi-task Learning with Cross Attention for Keyword Spotting

Takuya Higuchi; Anmol Gupta; Chandra Dhir

arXiv:2107.07634·eess.AS·September 23, 2021

Multi-task Learning with Cross Attention for Keyword Spotting

Takuya Higuchi, Anmol Gupta, Chandra Dhir

PDF

Open Access

TL;DR

This paper introduces a cross attention decoder within a multi-task learning framework for keyword spotting, significantly improving accuracy by better integrating phonetic information from speech data.

Contribution

The paper proposes a novel cross attention decoder for multi-task learning in KWS, enhancing information sharing between phonetic and keyword recognition tasks.

Findings

01

Achieves 12% reduction in false reject ratios

02

Outperforms conventional multi-task learning methods

03

Demonstrates effectiveness on KWS benchmarks

Abstract

Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data for automatic speech recognition (ASR), there is a mismatch between the training criterion (phoneme recognition) and the target task (KWS). Recently, multi-task learning has been applied to KWS to exploit both ASR and KWS training data. In this approach, an output of an acoustic model is split into two branches for the two tasks, one for phoneme transcription trained with the ASR data and one for keyword classification trained with the KWS data. In this paper, we introduce a cross attention decoder in the multi-task learning framework. Unlike the conventional multi-task learning approach with the simple split of the output layer, the cross attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing