Value Function Approximation in Zero-Sum Markov Games
Michail Lagoudakis, Ron Parr

TL;DR
This paper extends value function approximation techniques from MDPs to zero-sum Markov games, providing theoretical bounds and demonstrating practical algorithms like LSPI in multi-agent settings.
Contribution
It generalizes error bounds and reinforcement learning algorithms from MDPs to Markov games, including convergence guarantees for LSTD and TD methods.
Findings
Stronger bounds for the two-player stopping problem.
Convergence guarantees for LSTD and TD in Markov games.
Successful application of LSPI in soccer and flow control domains.
Abstract
This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping problem to a two-player simultaneous move Markov game. For this special problem, we provide stronger bounds and can guarantee convergence for LSTD and temporal difference learning with linear value function approximation. We demonstrate the viability of value function approximation for Markov games by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Simulation Techniques and Applications
