Complex Momentum for Optimization in Games
Jonathan Lorraine, David Acuna, Paul Vicol, David Duvenaud

TL;DR
This paper introduces complex-valued momentum for gradient-based optimization in differentiable games, providing theoretical convergence guarantees and demonstrating improved performance in adversarial settings like GANs.
Contribution
It proposes a novel complex-valued momentum method, extends it to Adam, and shows empirical benefits in adversarial training and generative models.
Findings
Convergence proven for bilinear zero-sum games.
Improved GAN training outcomes with complex momentum.
Enhanced inception scores on CIFAR-10 using complex Adam.
Abstract
We generalize gradient descent with momentum for optimization in differentiable games to have complex-valued momentum. We give theoretical motivation for our method by proving convergence on bilinear zero-sum games for simultaneous and alternating updates. Our method gives real-valued parameter updates, making it a drop-in replacement for standard optimizers. We empirically demonstrate that complex-valued momentum can improve convergence in realistic adversarial games - like generative adversarial networks - by showing we can find better solutions with an almost identical computational cost. We also show a practical generalization to a complex-valued Adam variant, which we use to train BigGAN to better inception scores on CIFAR-10.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks
MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's. · Dense Connections · Softmax · ((Reservation@Faqs))How do I cancel a reservation on Expedia? · Feedforward Network · Residual Connection · Non-Local Operation · GAN Hinge Loss · Two Time-scale Update Rule · Conditional Batch Normalization
