A fully recurrent feature extraction for single channel speech   enhancement

Muhammed PV Shifas; Santelli Claudio; Vassilis Tsiaras; Yannis; Stylianou

arXiv:2006.05233·eess.AS·June 7, 2021·5 cites

A fully recurrent feature extraction for single channel speech enhancement

Muhammed PV Shifas, Santelli Claudio, Vassilis Tsiaras, Yannis, Stylianou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a recurrent CNN-based feature extraction method for single-channel speech enhancement, improving noise differentiation and speech quality in noisy conditions with fewer parameters.

Contribution

It proposes integrating recurrency into CNN layers to enhance feature extraction for speech enhancement, addressing CNN limitations in modeling noise context.

Findings

01

Achieved up to 1.5 dB SSNR gain in unseen noise conditions.

02

Improved subjective quality by 0.4 in MOS scale.

03

Reduced model parameters by 25%.

Abstract

Convolutional neural network (CNN) modules are widely being used to build high-end speech enhancement neural models. However, the feature extraction power of vanilla CNN modules has been limited by the dimensionality constraint of the convolution kernels that are integrated - thereby, they have limitations to adequately model the noise context information at the feature extraction stage. To this end, adding recurrency factor into the feature extracting CNN layers, we introduce a robust context-aware feature extraction strategy for single-channel speech enhancement. As shown, adding recurrency results in capturing the local statistics of noise attributes at the extracted features level and thus, the suggested model is effective in differentiating speech cues even at very noisy conditions. When evaluated against enhancement models using vanilla CNN modules, in unseen noise conditions, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shifaspv/gruCNN-speech-enhancement-tensorflow
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsConvolution