On Convergence of Average-Reward Q-Learning in Weakly Communicating   Markov Decision Processes

Yi Wan; Huizhen Yu; Richard S. Sutton

arXiv:2408.16262·cs.LG·August 30, 2024

On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

Yi Wan, Huizhen Yu, Richard S. Sutton

PDF

Open Access

TL;DR

This paper extends the convergence analysis of RVI Q-learning algorithms to weakly communicating MDPs, broadening theoretical understanding and practical applicability in average-reward reinforcement learning.

Contribution

It generalizes convergence proofs from unichain to weakly communicating MDPs and characterizes the convergence sets for RVI Q-learning algorithms.

Findings

01

RVI Q-learning converges almost surely in weakly communicating MDPs.

02

The convergence set is compact, connected, and solutions to the average-reward optimality equation.

03

Hierarchical RVI algorithms also converge under weakly communicating assumptions.

Abstract

This paper analyzes reinforcement learning (RL) algorithms for Markov decision processes (MDPs) under the average-reward criterion. We focus on Q-learning algorithms based on relative value iteration (RVI), which are model-free stochastic analogues of the classical RVI method for average-reward MDPs. These algorithms have low per-iteration complexity, making them well-suited for large state space problems. We extend the almost-sure convergence analysis of RVI Q-learning algorithms developed by Abounadi, Bertsekas, and Borkar (2001) from unichain to weakly communicating MDPs. This extension is important both practically and theoretically: weakly communicating MDPs cover a much broader range of applications compared to unichain MDPs, and their optimality equations have a richer solution structure (with multiple degrees of freedom), introducing additional complexity in proving algorithmic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Fault Detection and Control Systems · Neural Networks and Applications

MethodsSparse Evolutionary Training · Q-Learning · Focus