Sparse Mixture of Local Experts for Efficient Speech Enhancement
Aswin Sivaraman, Minje Kim

TL;DR
This paper presents a sparse mixture of local expert neural networks for speech denoising, which improves performance and reduces complexity by specialized sub-models guided by a gating network based on speech degradation or speaker gender.
Contribution
The paper introduces a novel ensemble model with a gating network that assigns speech signals to specialized neural networks, enhancing denoising performance with fewer parameters.
Findings
Ensemble of specialist networks outperforms a generalist network in speech denoising.
The proposed model reduces computational complexity while maintaining high denoising quality.
Gating network effectively classifies subproblems based on speech degradation or speaker gender.
Abstract
In this paper, we investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks. By splitting up the speech denoising task into non-overlapping subproblems and introducing a classifier, we are able to improve denoising performance while also reducing computational complexity. More specifically, the proposed model incorporates a gating network which assigns noisy speech signals to an appropriate specialist network based on either speech degradation level or speaker gender. In our experiments, a baseline recurrent network is compared against an ensemble of similarly-designed smaller recurrent networks regulated by the auxiliary gating network. Using stochastically generated batches from a large noisy speech corpus, the proposed model learns to estimate a time-frequency masking matrix based on the magnitude spectrogram of an input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
MethodsSigmoid Activation · Long Short-Term Memory
