Decentralized model-free reinforcement learning in stochastic games with   average-reward objective

Romain Cravic; Nicolas Gast; Bruno Gaujal

arXiv:2301.05630·cs.LG·January 16, 2023

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

Romain Cravic, Nicolas Gast, Bruno Gaujal

PDF

Open Access

TL;DR

This paper introduces DONQ-learning, a model-free algorithm for decentralized two-player zero-sum stochastic games with average reward, achieving low regret with efficient computation and memory use.

Contribution

It presents the first decentralized model-free algorithm with provable low regret guarantees for infinite-horizon stochastic games under average reward.

Findings

01

Achieves sublinear regret of order T^{3/4} with high probability.

02

Achieves sublinear expected regret of order T^{2/3}.

03

Has low computational complexity and memory requirements.

Abstract

We propose the first model-free algorithm that achieves low regret performance for decentralized learning in two-player zero-sum tabular stochastic games with infinite-horizon average-reward objective. In decentralized learning, the learning agent controls only one player and tries to achieve low regret performances against an arbitrary opponent. This contrasts with centralized learning where the agent tries to approximate the Nash equilibrium by controlling both players. In our infinite-horizon undiscounted setting, additional structure assumptions is needed to provide good behaviors of learning processes : here we assume for every strategy of the opponent, the agent has a way to go from any state to any other. This assumption is the analogous to the "communicating" assumption in the MDP setting. We show that our Decentralized Optimistic Nash Q-Learning (DONQ-learning) algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization

MethodsQ-Learning