No-Regret Learning in Unknown Games with Correlated Payoffs
Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas, Krause

TL;DR
This paper introduces a new bandit learning algorithm for unknown multi-agent games with correlated payoffs, leveraging Gaussian processes to achieve near full-information regret bounds with practical applications.
Contribution
It proposes GP-MW, a novel kernel-based bandit algorithm that exploits payoff correlations and outperforms existing methods in unknown game settings.
Findings
GP-MW achieves kernel-dependent regret bounds similar to full information settings.
The algorithm outperforms baselines in traffic routing and movie recommendation tasks.
Experimental results show GP-MW's performance is often close to full-information methods.
Abstract
We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs). We propose a novel confidence-bound based bandit algorithm GP-MW, which utilizes the GP model for the reward function and runs a multiplicative weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference
