TL;DR
This paper introduces a real-time complex neural network model using frequency-time-LSTMs for improved acoustic echo cancellation and speech enhancement, outperforming existing baselines in noisy and reverberant conditions.
Contribution
It proposes a novel F-T-LSTM based complex neural network for joint AEC and speech enhancement, with a modified SI-SNR loss function for better performance.
Findings
Outperforms baseline by 0.27 MOS
Uses only 1.4 million parameters
Effective in noisy and reverberant environments
Abstract
With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-T-LSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
