A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech   Recognizer And Large Scale Synthetic Data

Nathan Howard; Alex Park; Turaj Zakizadeh Shabestary; Alexander; Gruenstein; Rohit Prabhavalkar

arXiv:2106.00856·eess.AS·June 3, 2021

A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data

Nathan Howard, Alex Park, Turaj Zakizadeh Shabestary, Alexander, Gruenstein, Rohit Prabhavalkar

PDF

TL;DR

This paper introduces a neural acoustic echo canceller optimized for speech recognition tasks, utilizing an augmented loss function and large synthetic datasets to significantly improve word error rates in real-world scenarios.

Contribution

It proposes a novel training approach combining ASR-aware loss augmentation and synthetic data augmentation with domain adaptation techniques.

Findings

01

57% improvement over signal processing baseline

02

45% improvement over standard neural AEC

03

Effective domain adaptation with SpecAugment masks

Abstract

We consider the problem of recognizing speech utterances spoken to a device which is generating a known sound waveform; for example, recognizing queries issued to a digital assistant which is generating responses to previous user inputs. Previous work has proposed building acoustic echo cancellation (AEC) models for this task that optimize speech enhancement metrics using both neural network as well as signal processing approaches. Since our goal is to recognize the input speech, we consider enhancements which improve word error rates (WERs) when the predicted speech signal is passed to an automatic speech recognition (ASR) model. First, we augment the loss function with a term that produces outputs useful to a pre-trained ASR model and show that this augmented loss function improves WER metrics. Second, we demonstrate that augmenting our training dataset of real world examples with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.