Streaming Noise Context Aware Enhancement For Automatic Speech   Recognition in Multi-Talker Environments

Joe Caroselli; Arun Narayanan; Yiteng Huang

arXiv:2205.08555·eess.AS·May 19, 2022

Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments

Joe Caroselli, Arun Narayanan, Yiteng Huang

PDF

Open Access

TL;DR

This paper introduces two streaming, noise context-aware speech enhancement algorithms for multi-talker environments, improving automatic speech recognition accuracy on smart devices by effectively handling interfering speech.

Contribution

It presents novel multi-microphone algorithms that leverage noise context and hotword detection, with an adaptive selection mechanism for enhanced speech recognition in multi-talker scenarios.

Findings

01

Achieves 55% relative WER reduction at -12dB SNR

02

Achieves 43% relative WER reduction at 12dB SNR

03

Algorithms are complementary and effective in real-time multi-talker environments

Abstract

One of the most challenging scenarios for smart speakers is multi-talker, when target speech from the desired speaker is mixed with interfering speech from one or more speakers. A smart assistant needs to determine which voice to recognize and which to ignore and it needs to do so in a streaming, low-latency manner. This work presents two multi-microphone speech enhancement algorithms targeted at this scenario. Targeting on-device use-cases, we assume that the algorithm has access to the signal before the hotword, which is referred to as the noise context. First is the Context Aware Beamformer which uses the noise context and detected hotword to determine how to target the desired speaker. The second is an adaptive noise cancellation algorithm called Speech Cleaner which trains a filter using the noise context. It is demonstrated that the two algorithms are complementary in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis

MethodsAttentive Walk-Aggregating Graph Neural Network