Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU
Dengfeng Ke, Jinsong Zhang, Yanlu Xie, Yanyan Xu, Binghuai Lin

TL;DR
This paper proposes a compact neural network model for single-channel speech enhancement that outperforms larger models by introducing separable polling attention, global layer normalization with PReLU, and replacing BLSTM with Conv2d, significantly reducing size and improving performance.
Contribution
The paper introduces techniques to shrink the PHASEN model while enhancing its speech enhancement performance, including separable polling attention and normalization improvements.
Findings
Model size reduced from 33M to 5M parameters.
Performance improved with higher CSIG, PESQ, and COVL scores.
Achieved state-of-the-art results with a smaller model.
Abstract
Single channel speech enhancement is a challenging task in speech community. Recently, various neural networks based methods have been applied to speech enhancement. Among these models, PHASEN and T-GSA achieve state-of-the-art performances on the publicly opened VoiceBank+DEMAND corpus. Both of the models reach the COVL score of 3.62. PHASEN achieves the highest CSIG score of 4.21 while T-GSA gets the highest PESQ score of 3.06. However, both of these two models are very large. The contradiction between the model performance and the model size is hard to reconcile. In this paper, we introduce three kinds of techniques to shrink the PHASEN model and improve the performance. Firstly, seperable polling attention is proposed to replace the frequency transformation blocks in PHASEN. Secondly, global layer normalization followed with PReLU is used to replace batch normalization followed with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
MethodsBatch Normalization · Layer Normalization · Parameterized ReLU
