Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration
Huizhen Yu, Yi Wan, Richard S. Sutton

TL;DR
This paper proves the convergence of an asynchronous stochastic approximation version of relative value iteration for average-reward semi-Markov decision processes, expanding the theoretical understanding of RVI Q-learning.
Contribution
It establishes the convergence of an asynchronous RVI Q-learning algorithm for SMDPs and introduces new monotonicity conditions for estimating the optimal reward rate.
Findings
Proves almost sure convergence of the algorithm.
Shows convergence to a solution set of the optimality equation.
Introduces new stability and monotonicity conditions.
Abstract
This paper applies the authors' recent results on asynchronous stochastic approximation (SA) in the Borkar-Meyn framework to reinforcement learning in average-reward semi-Markov decision processes (SMDPs). We establish the convergence of an asynchronous SA analogue of Schweitzer's classical relative value iteration algorithm, RVI Q-learning, for finite-space, weakly communicating SMDPs. In particular, we show that the algorithm converges almost surely to a compact, connected subset of solutions to the average-reward optimality equation, with convergence to a unique, sample path-dependent solution under additional stepsize and asynchrony conditions. Moreover, to make full use of the SA framework, we introduce new monotonicity conditions for estimating the optimal reward rate in RVI Q-learning. These conditions substantially expand the previously considered algorithmic framework and are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Age of Information Optimization · Adaptive Dynamic Programming Control
