A Multi-Step Minimax Q-learning Algorithm for Two-Player Zero-Sum Markov Games
Shreyas S R, Antony Vijesh

TL;DR
This paper introduces a two-step minimax Q-learning algorithm for two-player zero-sum Markov games, proving its convergence to the optimal value without prior model knowledge and demonstrating its effectiveness through simulations.
Contribution
The paper presents a novel two-step minimax Q-learning algorithm with proven convergence in zero-sum Markov games, even without model information.
Findings
Algorithm converges to the game-theoretic optimal value with probability one.
The proposed method is effective and easy to implement.
Numerical simulations validate the theoretical results.
Abstract
An interesting iterative procedure is proposed to solve a two-player zero-sum Markov games. Under suitable assumption, the boundedness of the proposed iterates is obtained theoretically. Using results from stochastic approximation, the almost sure convergence of the proposed two-step minimax Q-learning is obtained theoretically. More specifically, the proposed algorithm converges to the game theoretic optimal value with probability one, when the model information is not known. Numerical simulation authenticate that the proposed algorithm is effective and easy to implement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques
MethodsQ-Learning
