Multi-Scale Attention Neural Network for Acoustic Echo Cancellation
Lu Ma, Song Yang, Yaguang Gong, Zhongqin Wu

TL;DR
This paper introduces a multi-scale attention neural network for acoustic echo cancellation that effectively models nonlinear distortions and noise, outperforming traditional adaptive filters in various challenging scenarios.
Contribution
It presents a novel end-to-end neural network architecture combining temporal convolution, attention mechanisms, and LSTM for improved AEC performance.
Findings
Outperforms traditional methods in echo suppression (ERLE)
Achieves higher speech quality (PESQ) scores in noisy conditions
Effective in nonlinear distortion scenarios
Abstract
Acoustic Echo Cancellation (AEC) plays a key role in speech interaction by suppressing the echo received at microphone introduced by acoustic reverberations from loudspeakers. Since the performance of linear adaptive filter (AF) would degrade severely due to nonlinear distortions, background noises, and microphone clipping in real scenarios, deep learning has been employed for AEC for its good nonlinear modelling ability. In this paper, we constructed an end-to-end multi-scale attention neural network for AEC. Temporal convolution is first used to transform waveform into spectrogram. The spectrograms of the far-end reference and the near-end mixture are concatenated, and fed to a temporal convolution network (TCN) with stacked dilated convolution layers. Attention mechanism is performed among these representations from different layers to adaptively extract relevant features by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Image and Signal Denoising Methods
