Full Attention Bidirectional Deep Learning Structure for Single Channel   Speech Enhancement

Yuzi Yan; Wei-Qiang Zhang; Michael T. Johnson

arXiv:2108.12105·cs.SD·August 30, 2021

Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement

Yuzi Yan, Wei-Qiang Zhang, Michael T. Johnson

PDF

Open Access

TL;DR

This paper introduces a novel bidirectional deep learning model with full attention for single-channel speech enhancement, significantly improving speech quality over existing methods by utilizing latent information effectively.

Contribution

It presents a new bidirectional attention-based architecture that extends previous RNN methods, enhancing speech enhancement performance.

Findings

01

Outperforms OM-LSA, CNN-LSTM, T-GSA, and unidirectional attention-based LSTM in speech quality

02

Uses full attention mechanism to leverage latent information after each focal frame

03

Achieves better PESQ scores compared to baseline models

Abstract

As the cornerstone of other important technologies, such as speech recognition and speech synthesis, speech enhancement is a critical area in audio signal processing. In this paper, a new deep learning structure for speech enhancement is demonstrated. The model introduces a "full" attention mechanism to a bidirectional sequence-to-sequence method to make use of latent information after each focal frame. This is an extension of the previous attention-based RNN method. The proposed bidirectional attention-based architecture achieves better performance in terms of speech quality (PESQ), compared with OM-LSA, CNN-LSTM, T-GSA and the unidirectional attention-based LSTM baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory