Speech Enhancement using Separable Polling Attention and Global Layer   Normalization followed with PReLU

Dengfeng Ke; Jinsong Zhang; Yanlu Xie; Yanyan Xu; Binghuai Lin

arXiv:2105.02509·cs.SD·May 7, 2021

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Dengfeng Ke, Jinsong Zhang, Yanlu Xie, Yanyan Xu, Binghuai Lin

PDF

Open Access

TL;DR

This paper proposes a compact neural network model for single-channel speech enhancement that outperforms larger models by introducing separable polling attention, global layer normalization with PReLU, and replacing BLSTM with Conv2d, significantly reducing size and improving performance.

Contribution

The paper introduces techniques to shrink the PHASEN model while enhancing its speech enhancement performance, including separable polling attention and normalization improvements.

Findings

01

Model size reduced from 33M to 5M parameters.

02

Performance improved with higher CSIG, PESQ, and COVL scores.

03

Achieved state-of-the-art results with a smaller model.

Abstract

Single channel speech enhancement is a challenging task in speech community. Recently, various neural networks based methods have been applied to speech enhancement. Among these models, PHASEN and T-GSA achieve state-of-the-art performances on the publicly opened VoiceBank+DEMAND corpus. Both of the models reach the COVL score of 3.62. PHASEN achieves the highest CSIG score of 4.21 while T-GSA gets the highest PESQ score of 3.06. However, both of these two models are very large. The contradiction between the model performance and the model size is hard to reconcile. In this paper, we introduce three kinds of techniques to shrink the PHASEN model and improve the performance. Firstly, seperable polling attention is proposed to replace the frequency transformation blocks in PHASEN. Secondly, global layer normalization followed with PReLU is used to replace batch normalization followed with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques

MethodsBatch Normalization · Layer Normalization · Parameterized ReLU