A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation,   Speech Enhancement and Speech Separation

Tom O'Malley; Arun Narayanan; Quan Wang; Alex Park; James Walker,; Nathan Howard

arXiv:2111.09935·eess.AS·November 22, 2021

A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation

Tom O'Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker,, Nathan Howard

PDF

TL;DR

This paper introduces a Conformer-based frontend that jointly performs acoustic echo cancellation, speech enhancement, and speech separation, significantly improving ASR robustness in noisy and echo conditions by using a multi-input neural network.

Contribution

The novel joint model integrates three speech processing tasks into a single neural network, maintaining near-task-specific performance and enhancing robustness in challenging environments.

Findings

01

Reduces word error rate by at least 71% in low SNR conditions.

02

Performs within 10% of task-specific models on echo cancellation.

03

Significantly improves ASR accuracy in noisy and multi-speaker scenarios.

Abstract

We present a frontend for improving robustness of automatic speech recognition (ASR), that jointly implements three modules within a single model: acoustic echo cancellation, speech enhancement, and speech separation. This is achieved by using a contextual enhancement neural network that can optionally make use of different types of side inputs: (1) a reference signal of the playback audio, which is necessary for echo cancellation; (2) a noise context, which is useful for speech enhancement; and (3) an embedding vector representing the voice characteristic of the target speaker of interest, which is not only critical in speech separation, but also helpful for echo cancellation and speech enhancement. We present detailed evaluations to show that the joint model performs almost as well as the task-specific models, and significantly reduces word error rate in noisy conditions even when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.