Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach
Donghwan Lee

TL;DR
This paper provides a finite-time convergence analysis of minimax Q-learning in two-player zero-sum Markov games using a switching system approach, bridging control theory and reinforcement learning.
Contribution
It introduces a novel switching system framework for analyzing minimax Q-learning and value iteration, offering clearer insights into their convergence properties.
Findings
Finite-time convergence guarantees for minimax Q-learning.
Enhanced understanding of the relationship between control theory and reinforcement learning.
Potential for improved algorithms based on the switching system analysis.
Abstract
The objective of this paper is to investigate the finite-time analysis of a Q-learning algorithm applied to two-player zero-sum Markov games. Specifically, we establish a finite-time analysis of both the minimax Q-learning algorithm and the corresponding value iteration method. To enhance the analysis of both value iteration and Q-learning, we employ the switching system model of minimax Q-learning and the associated value iteration. This approach provides further insights into minimax Q-learning and facilitates a more straightforward and insightful convergence analysis. We anticipate that the introduction of these additional insights has the potential to uncover novel connections and foster collaboration between concepts in the fields of control theory and reinforcement learning communities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control
MethodsQ-Learning
