Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning   via Clairvoyant Multiplicative Weights Update

Georgios Piliouras; Ryann Sim; Stratis Skoulakis

arXiv:2111.14737·cs.GT·June 30, 2022·1 cites

Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update

Georgios Piliouras, Ryann Sim, Stratis Skoulakis

PDF

Open Access 1 Video

TL;DR

This paper introduces the Clairvoyant Multiplicative Weights Update (CMWU), an algorithm for regret minimization in games that achieves near-optimal convergence rates by leveraging a clairvoyant approach and efficient computation.

Contribution

The paper presents a novel CMWU algorithm that attains constant regret and fast convergence to coarse correlated equilibrium in general games, improving upon existing rates.

Findings

01

CMWU achieves constant regret of (rac{\u2212 ext{ln}(m)}{\u03b7}) in all normal-form games.

02

The updates can be computed linearly fast via a contraction map under certain step-size conditions.

03

The dynamics converge at a rate of O(nV ( ext{log} m ext{log} T / T)) to a coarse correlated equilibrium.

Abstract

In this paper, we provide a novel and simple algorithm, Clairvoyant Multiplicative Weights Updates (CMWU) for regret minimization in general games. CMWU effectively corresponds to the standard MWU algorithm but where all agents, when updating their mixed strategies, use the payoff profiles based on tomorrow's behavior, i.e. the agents are clairvoyant. CMWU achieves constant regret of $ln (m) / η$ in all normal-form games with m actions and fixed step-sizes $η$ . Although CMWU encodes in its definition a fixed point computation, which in principle could result in dynamics that are neither computationally efficient nor uncoupled, we show that both of these issues can be largely circumvented. Specifically, as long as the step-size $η$ is upper bounded by $\frac{1}{( n - 1 ) V}$ , where $n$ is the number of agents and $[0, V]$ is the payoff range, then the CMWU updates can be computed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Game Theory and Applications