Y$^2$-Net FCRN for Acoustic Echo and Noise Suppression
Ernst Seidel, Jan Franzen, Maximilian Strake, Tim Fingscheidt

TL;DR
This paper introduces Y$^2$-Net, a two-stage deep learning model combining AEC and noise suppression using fully convolutional recurrent networks, improving speech quality in echo and noise conditions.
Contribution
The paper proposes a novel two-stage FCRN-based model that separately estimates echo and performs noise suppression, enhancing speech quality over existing single-stage methods.
Findings
Achieved an average improvement of 0.46 points in DECMOS metric over the baseline.
Demonstrated competitive performance in the Interspeech 2021 AEC Challenge.
Validated the effectiveness of separate echo estimation in a deep neural network framework.
Abstract
In recent years, deep neural networks (DNNs) were studied as an alternative to traditional acoustic echo cancellation (AEC) algorithms. The proposed models achieved remarkable performance for the separate tasks of AEC and residual echo suppression (RES). A promising network topology is a fully convolutional recurrent network (FCRN) structure, which has already proven its performance on both noise suppression and AEC tasks, individually. However, the combination of AEC, postfiltering, and noise suppression to a single network typically leads to a noticeable decline in the quality of the near-end speech component due to the lack of a separate loss for echo estimation. In this paper, we propose a two-stage model (Y-Net) which consists of two FCRNs, each with two inputs and one output (Y-Net). The first stage (AEC) yields an echo estimate, which - as a novelty for a DNN AEC model - is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
