Provably Efficient Information-Directed Sampling Algorithms for   Multi-Agent Reinforcement Learning

Qiaosheng Zhang; Chenjia Bai; Shuyue Hu; Zhen Wang; Xuelong Li

arXiv:2404.19292·cs.IT·May 1, 2024

Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning

Qiaosheng Zhang, Chenjia Bai, Shuyue Hu, Zhen Wang, Xuelong Li

PDF

Open Access

TL;DR

This paper introduces and analyzes new information-directed sampling algorithms for multi-agent reinforcement learning, demonstrating their sample efficiency and extending their applicability to various game settings.

Contribution

The paper presents novel IDS-based algorithms for MARL, including MAIDS, Reg-MAIDS, and Compressed-MAIDS, with theoretical guarantees and extensions to multi-player general-sum games.

Findings

01

Achieves Bayesian regret of ~O(√K) in two-player zero-sum Markov games

02

Reg-MAIDS reduces computational complexity while maintaining regret bounds

03

Extends to multi-player general-sum games learning equilibria efficiently

Abstract

This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as MAIDS, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the joint information ratio of the joint policy, and the min-player then minimizes the marginal information ratio with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of tilde{O}(sqrt{K}) for K…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems

MethodsSparse Evolutionary Training