MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement
Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli,, Xugang Lu, Yu Tsao

TL;DR
This paper introduces MetricGAN+, an enhanced speech enhancement method that optimizes perceptual quality metrics directly, leading to improved speech quality and state-of-the-art PESQ scores on the VoiceBank-DEMAND dataset.
Contribution
MetricGAN+ incorporates three domain-knowledge-based training techniques to improve upon the original MetricGAN for better speech enhancement performance.
Findings
PESQ score increased by 0.3 over previous MetricGAN
Achieved state-of-the-art PESQ score of 3.15
Demonstrated effectiveness on VoiceBank-DEMAND dataset
Abstract
The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score =…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
