Alternating Mirror Descent for Constrained Min-Max Games
Andre Wibisono, Molei Tao, Georgios Piliouras

TL;DR
This paper introduces an alternating mirror descent algorithm for constrained min-max games, demonstrating improved convergence properties over simultaneous methods by analyzing its regret bounds and behavior in constrained and unconstrained settings.
Contribution
It proposes and analyzes an alternating mirror descent algorithm for constrained zero-sum games, showing better convergence than simultaneous methods and connecting to existing gradient descent results.
Findings
Achieves an $O(K^{-2/3})$ regret bound for the proposed algorithm.
Demonstrates divergence of simultaneous mirror descent in constrained settings.
Recovers known results for unconstrained zero-sum games with alternating gradient descent.
Abstract
In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrained optimization. We interpret alternating mirror descent as an alternating discretization of a skew-gradient flow in the dual space, and use tools from convex optimization and modified energy function to establish an bound on its average regret after iterations. This quantitatively verifies the algorithm's better behavior than the simultaneous version of mirror descent algorithm, which is known to diverge and yields an average regret bound. In the special case…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods
