Deep SOR Minimax Q-learning for Two-player Zero-sum Game

Saksham Gautam; Lakshmi Mandal; Shalabh Bhatnagar

arXiv:2511.16226·cs.LG·November 21, 2025

Deep SOR Minimax Q-learning for Two-player Zero-sum Game

Saksham Gautam, Lakshmi Mandal, Shalabh Bhatnagar

PDF

Open Access

TL;DR

This paper introduces a deep minimax Q-learning algorithm for two-player zero-sum games, extending previous tabular methods to high-dimensional spaces with neural networks, and proves its finite-time convergence.

Contribution

It presents the first deep successive over-relaxation minimax Q-learning algorithm with convergence guarantees for high-dimensional settings.

Findings

01

The proposed method outperforms existing Q-learning algorithms in numerical experiments.

02

Ablation studies show the impact of the relaxation parameter on performance.

Abstract

In this work, we consider the problem of a two-player zero-sum game. In the literature, the successive over-relaxation Q-learning algorithm has been developed and implemented, and it is seen to result in a lower contraction factor for the associated Q-Bellman operator resulting in a faster value iteration-based procedure. However, this has been presented only for the tabular case and not for the setting with function approximation that typically caters to real-world high-dimensional state-action spaces. Furthermore, such settings in the case of two-player zero-sum games have not been considered. We thus propose a deep successive over-relaxation minimax Q-learning algorithm that incorporates deep neural networks as function approximators and is suitable for high-dimensional spaces. We prove the finite-time convergence of the proposed algorithm. Through numerical experiments, we show the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Optimization and Variational Analysis · Reinforcement Learning in Robotics