Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum   Markov Games: Switching System Approach

Donghwan Lee

arXiv:2306.05700·eess.SY·June 13, 2023·1 cites

Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

Donghwan Lee

PDF

Open Access

TL;DR

This paper provides a finite-time convergence analysis of minimax Q-learning in two-player zero-sum Markov games using a switching system approach, bridging control theory and reinforcement learning.

Contribution

It introduces a novel switching system framework for analyzing minimax Q-learning and value iteration, offering clearer insights into their convergence properties.

Findings

01

Finite-time convergence guarantees for minimax Q-learning.

02

Enhanced understanding of the relationship between control theory and reinforcement learning.

03

Potential for improved algorithms based on the switching system analysis.

Abstract

The objective of this paper is to investigate the finite-time analysis of a Q-learning algorithm applied to two-player zero-sum Markov games. Specifically, we establish a finite-time analysis of both the minimax Q-learning algorithm and the corresponding value iteration method. To enhance the analysis of both value iteration and Q-learning, we employ the switching system model of minimax Q-learning and the associated value iteration. This approach provides further insights into minimax Q-learning and facilitates a more straightforward and insightful convergence analysis. We anticipate that the introduction of these additional insights has the potential to uncover novel connections and foster collaboration between concepts in the fields of control theory and reinforcement learning communities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control

MethodsQ-Learning