Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning
Qiaosheng Zhang, Chenjia Bai, Shuyue Hu, Zhen Wang, Xuelong Li

TL;DR
This paper introduces and analyzes new information-directed sampling algorithms for multi-agent reinforcement learning, demonstrating their sample efficiency and extending their applicability to various game settings.
Contribution
The paper presents novel IDS-based algorithms for MARL, including MAIDS, Reg-MAIDS, and Compressed-MAIDS, with theoretical guarantees and extensions to multi-player general-sum games.
Findings
Achieves Bayesian regret of ~O(√K) in two-player zero-sum Markov games
Reg-MAIDS reduces computational complexity while maintaining regret bounds
Extends to multi-player general-sum games learning equilibria efficiently
Abstract
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as MAIDS, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the joint information ratio of the joint policy, and the min-player then minimizes the marginal information ratio with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of tilde{O}(sqrt{K}) for K…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems
MethodsSparse Evolutionary Training
