Towards noise robust trigger-word detection with contrastive learning   pre-task for fast on-boarding of new trigger-words

Sivakumar Balasubramanian; Aditya Jajodia; Gowtham Srinivasan

arXiv:2111.03971·cs.SD·July 28, 2022

Towards noise robust trigger-word detection with contrastive learning pre-task for fast on-boarding of new trigger-words

Sivakumar Balasubramanian, Aditya Jajodia, Gowtham Srinivasan

PDF

Open Access

TL;DR

This paper proposes contrastive learning techniques, including a novel self-supervised method, to improve trigger-word detection robustness and reduce data requirements for new trigger-words in voice assistants.

Contribution

It introduces contrastive learning as a pre-training approach for trigger-word detection, enabling better generalization with less data and noise robustness, including a new self-supervised method.

Findings

01

Contrastive pre-training matches traditional methods in performance.

02

Self-supervised contrastive training reduces data needs.

03

Improved noise robustness in trigger-word detection.

Abstract

Trigger-word detection plays an important role as the entry point of user's communication with voice assistants. But supporting a particular word as a trigger-word involves huge amount of data collection, augmentation and labelling for that word. This makes supporting new trigger-words a tedious and time consuming process. To combat this, we explore the use of contrastive learning as a pre-training task that helps the detection model to generalize to different words and noise conditions. We explore supervised contrastive techniques and also propose a novel self-supervised training technique using chunked words from long sentence audios. We show that both supervised and the new self-supervised contrastive pre-training techniques have comparable results to a traditional classification pre-training on new trigger words with less data availability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsContrastive Learning