Concentration bounds for SSP Q-learning for average cost MDPs

Shaan Ul Haque; Vivek Borkar

arXiv:2206.03328·cs.LG·June 14, 2022

Concentration bounds for SSP Q-learning for average cost MDPs

Shaan Ul Haque, Vivek Borkar

PDF

Open Access

TL;DR

This paper establishes a concentration bound for a Q-learning algorithm applied to average cost Markov decision processes, providing theoretical insights and numerical comparisons with existing methods.

Contribution

It introduces a new concentration bound for SSP Q-learning in average cost MDPs and compares it with relative value iteration.

Findings

01

Derived a concentration bound for SSP Q-learning.

02

Numerical comparison shows performance differences.

03

Provides theoretical guarantees for average cost MDPs.

Abstract

We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems

MethodsQ-Learning