Deep SOR Minimax Q-learning for Two-player Zero-sum Game
Saksham Gautam, Lakshmi Mandal, Shalabh Bhatnagar

TL;DR
This paper introduces a deep minimax Q-learning algorithm for two-player zero-sum games, extending previous tabular methods to high-dimensional spaces with neural networks, and proves its finite-time convergence.
Contribution
It presents the first deep successive over-relaxation minimax Q-learning algorithm with convergence guarantees for high-dimensional settings.
Findings
The proposed method outperforms existing Q-learning algorithms in numerical experiments.
Ablation studies show the impact of the relaxation parameter on performance.
Abstract
In this work, we consider the problem of a two-player zero-sum game. In the literature, the successive over-relaxation Q-learning algorithm has been developed and implemented, and it is seen to result in a lower contraction factor for the associated Q-Bellman operator resulting in a faster value iteration-based procedure. However, this has been presented only for the tabular case and not for the setting with function approximation that typically caters to real-world high-dimensional state-action spaces. Furthermore, such settings in the case of two-player zero-sum games have not been considered. We thus propose a deep successive over-relaxation minimax Q-learning algorithm that incorporates deep neural networks as function approximators and is suitable for high-dimensional spaces. We prove the finite-time convergence of the proposed algorithm. Through numerical experiments, we show the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Optimization and Variational Analysis · Reinforcement Learning in Robotics
