Minimizing Regret on Reflexive Banach Spaces and Learning Nash Equilibria in Continuous Zero-Sum Games
Maximilian Balandat, Walid Krichene, Claire Tomlin, Alexandre Bayen

TL;DR
This paper extends online learning algorithms to infinite-dimensional Banach spaces, providing regret bounds and convergence guarantees for learning Nash equilibria in continuous zero-sum games without convexity assumptions.
Contribution
It generalizes Dual Averaging to reflexive Banach spaces, deriving regret bounds and applying them to continuous zero-sum games with convergence to Nash equilibria.
Findings
Derived explicit regret bounds in infinite-dimensional settings
Proved convergence of empirical distributions to Nash equilibria
Established Hannan-consistency of Dual Averaging in continuous games
Abstract
We study a general version of the adversarial online learning problem. We are given a decision set in a reflexive Banach space and a sequence of reward vectors in the dual space of . At each iteration, we choose an action from , based on the observed sequence of previous rewards. Our goal is to minimize regret, defined as the gap between the realized reward and the reward of the best fixed action in hindsight. Using results from infinite dimensional convex analysis, we generalize the method of Dual Averaging (or Follow the Regularized Leader) to our setting and obtain general upper bounds on the worst-case regret that subsume a wide range of results from the literature. Under the assumption of uniformly continuous rewards, we obtain explicit anytime regret bounds in a setting where the decision set is the set of probability distributions on a compact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
