MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

Szu-Wei Fu; Cheng Yu; Tsun-An Hsieh; Peter Plantinga; Mirco Ravanelli,; Xugang Lu; Yu Tsao

arXiv:2104.03538·cs.SD·June 7, 2021·25 cites

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli,, Xugang Lu, Yu Tsao

PDF

Open Access 3 Repos 2 Models

TL;DR

This paper introduces MetricGAN+, an enhanced speech enhancement method that optimizes perceptual quality metrics directly, leading to improved speech quality and state-of-the-art PESQ scores on the VoiceBank-DEMAND dataset.

Contribution

MetricGAN+ incorporates three domain-knowledge-based training techniques to improve upon the original MetricGAN for better speech enhancement performance.

Findings

01

PESQ score increased by 0.3 over previous MetricGAN

02

Achieved state-of-the-art PESQ score of 3.15

03

Demonstrated effectiveness on VoiceBank-DEMAND dataset

Abstract

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score =…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing