Cooperative Online Learning in Stochastic and Adversarial MDPs

Tal Lancewicki; Aviv Rosenberg; Yishay Mansour

arXiv:2201.13170·cs.LG·September 2, 2022

Cooperative Online Learning in Stochastic and Adversarial MDPs

Tal Lancewicki, Aviv Rosenberg, Yishay Mansour

PDF

Open Access

TL;DR

This paper investigates cooperative online learning in stochastic and adversarial MDPs, analyzing different randomness models, and provides nearly-matching regret bounds, pioneering work in cooperative RL with non-fresh randomness and adversarial environments.

Contribution

It introduces the first analysis of cooperative RL in non-fresh randomness and adversarial MDPs, with comprehensive regret bounds for these settings.

Findings

01

Nearly-matching regret lower and upper bounds for all settings.

02

First to analyze cooperative RL with non-fresh randomness.

03

Differentiates challenges between stochastic and adversarial environments.

Abstract

We study cooperative online learning in stochastic and adversarial Markov decision process (MDP). That is, in each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret. We consider environments with two types of randomness: \emph{fresh} -- where each agent's trajectory is sampled i.i.d, and \emph{non-fresh} -- where the realization is shared by all agents (but each agent's trajectory is also affected by its own actions). More precisely, with non-fresh randomness the realization of every cost and transition is fixed at the start of each episode, and agents that take the same action in the same state at the same time observe the same cost and next state. We thoroughly analyze all relevant settings, highlight the challenges and differences between the models, and prove nearly-matching regret lower and upper bounds. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Game Theory and Applications · Advanced Bandit Algorithms Research