Concentration bounds for SSP Q-learning for average cost MDPs
Shaan Ul Haque, Vivek Borkar

TL;DR
This paper establishes a concentration bound for a Q-learning algorithm applied to average cost Markov decision processes, providing theoretical insights and numerical comparisons with existing methods.
Contribution
It introduces a new concentration bound for SSP Q-learning in average cost MDPs and compares it with relative value iteration.
Findings
Derived a concentration bound for SSP Q-learning.
Numerical comparison shows performance differences.
Provides theoretical guarantees for average cost MDPs.
Abstract
We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
MethodsQ-Learning
