Negative Momentum for Improved Game Dynamics
Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, Remi, Lepriol, Gabriel Huang, Simon Lacoste-Julien, Ioannis Mitliagkas

TL;DR
This paper investigates the use of negative momentum in gradient-based methods for differentiable games, demonstrating improved stability and convergence in complex game dynamics like GAN training.
Contribution
It introduces the concept of negative momentum in alternating gradient updates, providing theoretical and empirical evidence of its effectiveness in stabilizing game dynamics.
Findings
Alternating updates are more stable than simultaneous ones.
Negative momentum enables convergence in difficult adversarial problems.
Improved training stability for GANs with negative momentum.
Abstract
Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiable games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optimization, game dynamics are more complex and less understood. In this paper, we analyze gradient-based methods with momentum on simple games. We prove that alternating updates are more stable than simultaneous updates. Next, we show both theoretically and empirically that alternating gradient updates with a negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
