A Multi-Step Minimax Q-learning Algorithm for Two-Player Zero-Sum Markov Games

Shreyas S R; Antony Vijesh

arXiv:2407.04240·cs.LG·September 23, 2025

A Multi-Step Minimax Q-learning Algorithm for Two-Player Zero-Sum Markov Games

Shreyas S R, Antony Vijesh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a two-step minimax Q-learning algorithm for two-player zero-sum Markov games, proving its convergence to the optimal value without prior model knowledge and demonstrating its effectiveness through simulations.

Contribution

The paper presents a novel two-step minimax Q-learning algorithm with proven convergence in zero-sum Markov games, even without model information.

Findings

01

Algorithm converges to the game-theoretic optimal value with probability one.

02

The proposed method is effective and easy to implement.

03

Numerical simulations validate the theoretical results.

Abstract

An interesting iterative procedure is proposed to solve a two-player zero-sum Markov games. Under suitable assumption, the boundedness of the proposed iterates is obtained theoretically. Using results from stochastic approximation, the almost sure convergence of the proposed two-step minimax Q-learning is obtained theoretically. More specifically, the proposed algorithm converges to the game theoretic optimal value with probability one, when the model information is not known. Numerical simulation authenticate that the proposed algorithm is effective and easy to implement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shreyassr123/multi-step-markov-games
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques

MethodsQ-Learning