A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum   Markov Games

Anna Winnicki; R. Srikant

arXiv:2303.09716·cs.LG·October 31, 2023·1 cites

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Anna Winnicki, R. Srikant

PDF

Open Access

TL;DR

This paper introduces a simple, efficient policy iteration algorithm for zero-sum Markov games that converges exponentially fast, with practical implementation in function approximation settings and implications for reinforcement learning.

Contribution

It proposes a novel variant of naive policy iteration using lookahead policies that guarantees exponential convergence in zero-sum Markov games.

Findings

01

The new algorithm converges exponentially fast.

02

Lookahead policies can be efficiently implemented in linear function approximation.

03

Provides bounds for policy-based reinforcement learning algorithms.

Abstract

Optimal policies in standard MDPs can be obtained using either value iteration or policy iteration. However, in the case of zero-sum Markov games, there is no efficient policy iteration algorithm; e.g., it has been shown that one has to solve Omega(1/(1-alpha)) MDPs, where alpha is the discount factor, to implement the only known convergent version of policy iteration. Another algorithm, called naive policy iteration, is easy to implement but is only provably convergent under very restrictive assumptions. Prior attempts to fix naive policy iteration algorithm have several limitations. Here, we show that a simple variant of naive policy iteration for games converges exponentially fast. The only addition we propose to naive policy iteration is the use of lookahead policies, which are anyway used in practical algorithms. We further show that lookahead can be implemented efficiently in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics