A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation
Tom O'Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker,, Nathan Howard

TL;DR
This paper introduces a Conformer-based frontend that jointly performs acoustic echo cancellation, speech enhancement, and speech separation, significantly improving ASR robustness in noisy and echo conditions by using a multi-input neural network.
Contribution
The novel joint model integrates three speech processing tasks into a single neural network, maintaining near-task-specific performance and enhancing robustness in challenging environments.
Findings
Reduces word error rate by at least 71% in low SNR conditions.
Performs within 10% of task-specific models on echo cancellation.
Significantly improves ASR accuracy in noisy and multi-speaker scenarios.
Abstract
We present a frontend for improving robustness of automatic speech recognition (ASR), that jointly implements three modules within a single model: acoustic echo cancellation, speech enhancement, and speech separation. This is achieved by using a contextual enhancement neural network that can optionally make use of different types of side inputs: (1) a reference signal of the playback audio, which is necessary for echo cancellation; (2) a noise context, which is useful for speech enhancement; and (3) an embedding vector representing the voice characteristic of the target speaker of interest, which is not only critical in speech separation, but also helpful for echo cancellation and speech enhancement. We present detailed evaluations to show that the joint model performs almost as well as the task-specific models, and significantly reduces word error rate in noisy conditions even when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
