Sparse Mixture of Local Experts for Efficient Speech Enhancement

Aswin Sivaraman; Minje Kim

arXiv:2005.08128·eess.AS·August 11, 2020·1 cites

Sparse Mixture of Local Experts for Efficient Speech Enhancement

Aswin Sivaraman, Minje Kim

PDF

Open Access 1 Repo

TL;DR

This paper presents a sparse mixture of local expert neural networks for speech denoising, which improves performance and reduces complexity by specialized sub-models guided by a gating network based on speech degradation or speaker gender.

Contribution

The paper introduces a novel ensemble model with a gating network that assigns speech signals to specialized neural networks, enhancing denoising performance with fewer parameters.

Findings

01

Ensemble of specialist networks outperforms a generalist network in speech denoising.

02

The proposed model reduces computational complexity while maintaining high denoising quality.

03

Gating network effectively classifies subproblems based on speech degradation or speaker gender.

Abstract

In this paper, we investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks. By splitting up the speech denoising task into non-overlapping subproblems and introducing a classifier, we are able to improve denoising performance while also reducing computational complexity. More specifically, the proposed model incorporates a gating network which assigns noisy speech signals to an appropriate specialist network based on either speech degradation level or speaker gender. In our experiments, a baseline recurrent network is compared against an ensemble of similarly-designed smaller recurrent networks regulated by the auxiliary gating network. Using stochastically generated batches from a large noisy speech corpus, the proposed model learns to estimate a time-frequency masking matrix based on the magnitude spectrogram of an input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IU-SAIGE/sparse_mle
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques

MethodsSigmoid Activation · Long Short-Term Memory