A fully recurrent feature extraction for single channel speech enhancement
Muhammed PV Shifas, Santelli Claudio, Vassilis Tsiaras, Yannis, Stylianou

TL;DR
This paper introduces a recurrent CNN-based feature extraction method for single-channel speech enhancement, improving noise differentiation and speech quality in noisy conditions with fewer parameters.
Contribution
It proposes integrating recurrency into CNN layers to enhance feature extraction for speech enhancement, addressing CNN limitations in modeling noise context.
Findings
Achieved up to 1.5 dB SSNR gain in unseen noise conditions.
Improved subjective quality by 0.4 in MOS scale.
Reduced model parameters by 25%.
Abstract
Convolutional neural network (CNN) modules are widely being used to build high-end speech enhancement neural models. However, the feature extraction power of vanilla CNN modules has been limited by the dimensionality constraint of the convolution kernels that are integrated - thereby, they have limitations to adequately model the noise context information at the feature extraction stage. To this end, adding recurrency factor into the feature extracting CNN layers, we introduce a robust context-aware feature extraction strategy for single-channel speech enhancement. As shown, adding recurrency results in capturing the local statistics of noise attributes at the extracted features level and thus, the suggested model is effective in differentiating speech cues even at very noisy conditions. When evaluated against enhancement models using vanilla CNN modules, in unseen noise conditions, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsConvolution
