SCP-GAN: Self-Correcting Discriminator Optimization for Training   Consistency Preserving Metric GAN on Speech Enhancement Tasks

Vasily Zadorozhnyy; Qiang Ye; Kazuhito Koishida

arXiv:2210.14474·cs.SD·October 27, 2022·1 cites

SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks

Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida

PDF

Open Access

TL;DR

This paper introduces a novel training scheme for GANs in speech enhancement, incorporating consistency loss and self-correcting discriminator optimization to improve performance and stability.

Contribution

It proposes new training techniques for GANs in speech enhancement, including consistency loss functions and self-correcting discriminator optimization, leading to improved results.

Findings

01

Achieved state-of-the-art results on Voice Bank+DEMAND dataset.

02

Demonstrated consistent improvements across multiple GAN-based SE models.

03

Enhanced training stability and performance in speech enhancement GANs.

Abstract

In recent years, Generative Adversarial Networks (GANs) have produced significantly improved results in speech enhancement (SE) tasks. They are difficult to train, however. In this work, we introduce several improvements to the GAN training schemes, which can be applied to most GAN-based SE models. We propose using consistency loss functions, which target the inconsistency in time and time-frequency domains caused by Fourier and Inverse Fourier Transforms. We also present self-correcting optimization for training a GAN discriminator on SE tasks, which helps avoid "harmful" training directions for parts of the discriminator loss function. We have tested our proposed methods on several state-of-the-art GAN-based SE models and obtained consistent improvements, including new state-of-the-art results for the Voice Bank+DEMAND dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders