Gradient-free Online Learning in Games with Delayed Rewards

Am\'elie H\'eliou; Panayotis Mertikopoulos; Zhengyuan Zhou

arXiv:2006.10911·cs.GT·June 22, 2020·6 cites

Gradient-free Online Learning in Games with Delayed Rewards

Am\'elie H\'eliou, Panayotis Mertikopoulos, Zhengyuan Zhou

PDF

Open Access

TL;DR

This paper introduces a gradient-free learning approach for multi-player games with delayed, asynchronous rewards, proving convergence to Nash equilibrium despite unbounded delays in feedback.

Contribution

It develops a novel gradient-free learning policy for delayed feedback in multi-player continuous action games and proves convergence to Nash equilibrium under broad conditions.

Findings

01

New regret bounds for delayed reward settings

02

Convergence to Nash equilibrium with probability 1

03

Applicable to unbounded delay scenarios

Abstract

Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-player games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order, with an a priori unbounded delay, etc.), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. In this general context, we derive new bounds for the agents' regret; furthermore, under a standard diagonal concavity assumption, we show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems