Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement
Yuzi Yan, Wei-Qiang Zhang, Michael T. Johnson

TL;DR
This paper introduces a novel bidirectional deep learning model with full attention for single-channel speech enhancement, significantly improving speech quality over existing methods by utilizing latent information effectively.
Contribution
It presents a new bidirectional attention-based architecture that extends previous RNN methods, enhancing speech enhancement performance.
Findings
Outperforms OM-LSA, CNN-LSTM, T-GSA, and unidirectional attention-based LSTM in speech quality
Uses full attention mechanism to leverage latent information after each focal frame
Achieves better PESQ scores compared to baseline models
Abstract
As the cornerstone of other important technologies, such as speech recognition and speech synthesis, speech enhancement is a critical area in audio signal processing. In this paper, a new deep learning structure for speech enhancement is demonstrated. The model introduces a "full" attention mechanism to a bidirectional sequence-to-sequence method to make use of latent information after each focal frame. This is an extension of the previous attention-based RNN method. The proposed bidirectional attention-based architecture achieves better performance in terms of speech quality (PESQ), compared with OM-LSA, CNN-LSTM, T-GSA and the unidirectional attention-based LSTM baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
