A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data
Nathan Howard, Alex Park, Turaj Zakizadeh Shabestary, Alexander, Gruenstein, Rohit Prabhavalkar

TL;DR
This paper introduces a neural acoustic echo canceller optimized for speech recognition tasks, utilizing an augmented loss function and large synthetic datasets to significantly improve word error rates in real-world scenarios.
Contribution
It proposes a novel training approach combining ASR-aware loss augmentation and synthetic data augmentation with domain adaptation techniques.
Findings
57% improvement over signal processing baseline
45% improvement over standard neural AEC
Effective domain adaptation with SpecAugment masks
Abstract
We consider the problem of recognizing speech utterances spoken to a device which is generating a known sound waveform; for example, recognizing queries issued to a digital assistant which is generating responses to previous user inputs. Previous work has proposed building acoustic echo cancellation (AEC) models for this task that optimize speech enhancement metrics using both neural network as well as signal processing approaches. Since our goal is to recognize the input speech, we consider enhancements which improve word error rates (WERs) when the predicted speech signal is passed to an automatic speech recognition (ASR) model. First, we augment the loss function with a term that produces outputs useful to a pre-trained ASR model and show that this augmented loss function improves WER metrics. Second, we demonstrate that augmenting our training dataset of real world examples with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
