Conditioning Trick for Training Stable GANs
Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges,, Patrick Cardinal, Alessandro Lameiras Koerich

TL;DR
This paper introduces a novel conditioning trick for GAN training that improves stability and quality of generated audio spectrograms by aligning generator outputs with spectral domain properties of real samples, enhancing performance on environmental sound datasets.
Contribution
The paper proposes a new conditioning trick based on spectral domain analysis to stabilize GAN training and improve audio synthesis quality, integrating residual networks into BigGAN architecture.
Findings
Outperforms baseline GANs on multiple metrics
Achieves higher inception scores and lower FID
Produces high-quality, phase-preserving audio spectrograms
Abstract
In this paper we propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition. This binding makes the generator amenable to truncation and does not limit exploring all the possible modes. We slightly modify the BigGAN architecture incorporating residual network for synthesizing 2D representations of audio signals which enables reconstructing high quality sounds with some preserved phase information. Additionally, the proposed conditional training scenario makes a trade-off between fidelity and variety for the generated spectrograms. The experimental results on UrbanSound8k and ESC-50 environmental sound datasets and the Mozilla common voice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsDense Connections · 1x1 Convolution · Feedforward Network · Non-Local Operation · *Communicated@Fast*How Do I Communicate to Expedia? · Six Ways To Communicate To Someone At Expedia Via Phone And Email's. · Softmax · Convolution · Non-Local Block · Conditional Batch Normalization
