Personalized Keyphrase Detection using Speaker and Environment Information
Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng, (Arden) Huang, Arun Narayanan, Ian McGraw

TL;DR
This paper presents a customizable streaming keyphrase detection system that leverages speaker and environment information, including speaker verification, separation, and noise cancellation, to improve accuracy under noisy conditions.
Contribution
It introduces an end-to-end trained system combining speaker verification, separation, and noise cancellation for robust keyphrase detection in noisy environments.
Findings
Speaker verification reduces false triggers.
Speaker separation and noise cancellation decrease false rejections.
System performs well across various noisy conditions.
Abstract
In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary. The system is implemented with an end-to-end trained automatic speech recognition (ASR) model and a text-independent speaker verification model. To address the challenge of detecting these keyphrases under various noisy conditions, a speaker separation model is added to the feature frontend of the speaker verification model, and an adaptive noise cancellation (ANC) algorithm is included to exploit cross-microphone noise coherence. Our experiments show that the text-independent speaker verification model largely reduces the false triggering rate of the keyphrase detection, while the speaker separation model and adaptive noise cancellation largely reduce false rejections.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
