MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data

George Close; Thomas Hain; Stefan Goetze

arXiv:2203.12369·cs.SD·June 16, 2022·1 cites

MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data

George Close, Thomas Hain, Stefan Goetze

PDF

Open Access

TL;DR

This paper introduces MetricGAN+/- with a de-generator network to enhance the robustness of speech enhancement models, leading to better generalization and improved perceptual quality scores on unseen data.

Contribution

The paper proposes a novel de-generator component in MetricGAN+/- to increase robustness and generalization in noise reduction models for unseen data.

Findings

01

3.8% improvement in PESQ score on VoiceBank-DEMAND dataset

02

Better generalization to unseen noise and speech

03

Robustness achieved through wider range of metric scores during training

Abstract

Training of speech enhancement systems often does not incorporate knowledge of human perception and thus can lead to unnatural sounding results. Incorporating psychoacoustically motivated speech perception metrics as part of model training via a predictor network has recently gained interest. However, the performance of such predictors is limited by the distribution of metric scores that appear in the training data. In this work, we propose MetricGAN+/- (an extension of MetricGAN+, one such metric-motivated system) which introduces an additional network - a "de-generator" which attempts to improve the robustness of the prediction network (and by extension of the generator) by ensuring observation of a wider range of metric scores in training. Experimental results on the VoiceBank-DEMAND dataset show relative improvement in PESQ score of 3.8% (3.05 vs 3.22 PESQ score), as well as better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing