A Hybrid Approach for Speech Enhancement Using MoG Model and Neural Network Phoneme Classifier
Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot

TL;DR
This paper introduces a hybrid speech enhancement method combining a generative MoG model with a discriminative neural network to improve speech quality and recognition accuracy in single-microphone scenarios.
Contribution
It presents a novel two-phase hybrid approach integrating MoG and neural network models for more effective speech enhancement.
Findings
Significant improvement in speech quality measures.
Enhanced speech recognition accuracy.
Effective noise suppression in real-world conditions.
Abstract
In this paper we present a single-microphone speech enhancement algorithm. A hybrid approach is proposed merging the generative mixture of Gaussians (MoG) model and the discriminative neural network (NN). The proposed algorithm is executed in two phases, the training phase, which does not recur, and the test phase. First, the noise-free speech power spectral density (PSD) is modeled as a MoG, representing the phoneme based diversity in the speech signal. An NN is then trained with phoneme labeled database for phoneme classification with mel-frequency cepstral coefficients (MFCC) as the input features. Given the phoneme classification results, a speech presence probability (SPP) is obtained using both the generative and discriminative models. Soft spectral subtraction is then executed while simultaneously, the noise estimation is updated. The discriminative NN maintain the continuity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Music and Audio Processing
