MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data
George Close, Thomas Hain, Stefan Goetze

TL;DR
This paper introduces MetricGAN+/- with a de-generator network to enhance the robustness of speech enhancement models, leading to better generalization and improved perceptual quality scores on unseen data.
Contribution
The paper proposes a novel de-generator component in MetricGAN+/- to increase robustness and generalization in noise reduction models for unseen data.
Findings
3.8% improvement in PESQ score on VoiceBank-DEMAND dataset
Better generalization to unseen noise and speech
Robustness achieved through wider range of metric scores during training
Abstract
Training of speech enhancement systems often does not incorporate knowledge of human perception and thus can lead to unnatural sounding results. Incorporating psychoacoustically motivated speech perception metrics as part of model training via a predictor network has recently gained interest. However, the performance of such predictors is limited by the distribution of metric scores that appear in the training data. In this work, we propose MetricGAN+/- (an extension of MetricGAN+, one such metric-motivated system) which introduces an additional network - a "de-generator" which attempts to improve the robustness of the prediction network (and by extension of the generator) by ensuring observation of a wider range of metric scores in training. Experimental results on the VoiceBank-DEMAND dataset show relative improvement in PESQ score of 3.8% (3.05 vs 3.22 PESQ score), as well as better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
