No-Regret Learning in Unknown Games with Correlated Payoffs

Pier Giuseppe Sessa; Ilija Bogunovic; Maryam Kamgarpour; Andreas; Krause

arXiv:1909.08540·cs.LG·October 29, 2019

No-Regret Learning in Unknown Games with Correlated Payoffs

Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas, Krause

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new bandit learning algorithm for unknown multi-agent games with correlated payoffs, leveraging Gaussian processes to achieve near full-information regret bounds with practical applications.

Contribution

It proposes GP-MW, a novel kernel-based bandit algorithm that exploits payoff correlations and outperforms existing methods in unknown game settings.

Findings

01

GP-MW achieves kernel-dependent regret bounds similar to full information settings.

02

The algorithm outperforms baselines in traffic routing and movie recommendation tasks.

03

Experimental results show GP-MW's performance is often close to full-information methods.

Abstract

We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs). We propose a novel confidence-bound based bandit algorithm GP-MW, which utilizes the GP model for the reward function and runs a multiplicative weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sessap/noregretgames
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference