# A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum   Stochastic Games

**Authors:** Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar

arXiv: 1906.06659 · 2022-03-21

## TL;DR

This paper introduces a generalized minimax Q-learning algorithm for two-player zero-sum stochastic games, extending successive relaxation techniques to improve computation speed and proving its convergence without known model information.

## Contribution

The paper develops a novel generalized minimax Q-learning algorithm for zero-sum games, extending successive relaxation methods and providing convergence proof under stochastic approximation.

## Key findings

- Faster computation of min-max values under certain game structures
- Convergence of the proposed algorithm is proven using stochastic approximation
- Experimental results demonstrate the algorithm's effectiveness

## Abstract

We consider the problem of two-player zero-sum games. This problem is formulated as a min-max Markov game in the literature. The solution of this game, which is the min-max payoff, starting from a given state is called the min-max value of the state. In this work, we compute the solution of the two-player zero-sum game utilizing the technique of successive relaxation that has been successfully applied in the literature to compute a faster value iteration algorithm in the context of Markov Decision Processes. We extend the concept of successive relaxation to the setting of two-player zero-sum games. We show that, under a special structure on the game, this technique facilitates faster computation of the min-max value of the states. We then derive a generalized minimax Q-learning algorithm that computes the optimal policy when the model information is not known. Finally, we prove the convergence of the proposed generalized minimax Q-learning algorithm utilizing stochastic approximation techniques, under an assumption on the boundedness of iterates. Through experiments, we demonstrate the effectiveness of our proposed algorithm.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.06659/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1906.06659/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1906.06659/full.md

---
Source: https://tomesphere.com/paper/1906.06659