Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus
Qiwen Cui, Simon S. Du

TL;DR
This paper introduces a strategy-wise bonus approach for offline multi-agent reinforcement learning, achieving improved sample complexity bounds and computational efficiency over prior point-wise methods, especially in large action spaces.
Contribution
It proposes the strategy-wise concentration principle, leading to algorithms with better sample complexity and computational efficiency for multi-agent Markov games.
Findings
Sample complexity scales with sum of actions, not joint action space.
Algorithms can incorporate a pre-specified strategy class with logarithmic complexity.
Achieves better dependency on action size in two-player zero-sum games.
Abstract
This paper considers offline multi-agent reinforcement learning. We propose the strategy-wise concentration principle which directly builds a confidence interval for the joint strategy, in contrast to the point-wise concentration principle that builds a confidence interval for each point in the joint action space. For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose a computationally efficient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise bonus. Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales where is the action size of the -th player and is the number of players. In sharp…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Game Theory and Applications · Auction Theory and Applications
