Dissecting adaptive methods in GANs
Samy Jelassi, David Dobre, Arthur Mensch, Yuanzhi Li, Gauthier Gidel

TL;DR
This paper investigates how adaptive optimization methods, especially Adam, facilitate GAN training by separating their magnitude and direction components, and demonstrates that normalized SGD ascent can achieve similar performance and mode coverage.
Contribution
It introduces a theoretical framework comparing nSGDA and SGDA in GANs, showing nSGDA's ability to prevent mode collapse and replicate Adam's effectiveness.
Findings
nSGDA recovers all modes of the true distribution in GAN training.
Adaptive magnitude in Adam is crucial for effective GAN training.
Normalized gradient methods match Adam's performance on several datasets.
Abstract
Adaptive methods are a crucial component widely used for training generative adversarial networks (GANs). While there has been some work to pinpoint the "marginal value of adaptive methods" in standard tasks, it remains unclear why they are still critical for GAN training. In this paper, we formally study how adaptive methods help train GANs; inspired by the grafting method proposed in arXiv:2002.11803 [cs.LG], we separate the magnitude and direction components of the Adam updates, and graft them to the direction and magnitude of SGDA updates respectively. By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training. This motivates us to have a closer look at the class of normalized stochastic gradient descent ascent (nSGDA) methods in the context of GAN training.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Music and Audio Processing
MethodsStochastic Gradient Descent · Adam
