Minimizing Regret on Reflexive Banach Spaces and Learning Nash   Equilibria in Continuous Zero-Sum Games

Maximilian Balandat; Walid Krichene; Claire Tomlin; Alexandre Bayen

arXiv:1606.01261·cs.LG·June 7, 2016·1 cites

Minimizing Regret on Reflexive Banach Spaces and Learning Nash Equilibria in Continuous Zero-Sum Games

Maximilian Balandat, Walid Krichene, Claire Tomlin, Alexandre Bayen

PDF

Open Access

TL;DR

This paper extends online learning algorithms to infinite-dimensional Banach spaces, providing regret bounds and convergence guarantees for learning Nash equilibria in continuous zero-sum games without convexity assumptions.

Contribution

It generalizes Dual Averaging to reflexive Banach spaces, deriving regret bounds and applying them to continuous zero-sum games with convergence to Nash equilibria.

Findings

01

Derived explicit regret bounds in infinite-dimensional settings

02

Proved convergence of empirical distributions to Nash equilibria

03

Established Hannan-consistency of Dual Averaging in continuous games

Abstract

We study a general version of the adversarial online learning problem. We are given a decision set $X$ in a reflexive Banach space $X$ and a sequence of reward vectors in the dual space of $X$ . At each iteration, we choose an action from $X$ , based on the observed sequence of previous rewards. Our goal is to minimize regret, defined as the gap between the realized reward and the reward of the best fixed action in hindsight. Using results from infinite dimensional convex analysis, we generalize the method of Dual Averaging (or Follow the Regularized Leader) to our setting and obtain general upper bounds on the worst-case regret that subsume a wide range of results from the literature. Under the assumption of uniformly continuous rewards, we obtain explicit anytime regret bounds in a setting where the decision set is the set of probability distributions on a compact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems