Q-Learning for Continuous State and Action MDPs under Average Cost   Criteria

Ali Devran Kara; Serdar Yuksel

arXiv:2308.07591·math.OC·December 10, 2024

Q-Learning for Continuous State and Action MDPs under Average Cost Criteria

Ali Devran Kara, Serdar Yuksel

PDF

Open Access

TL;DR

This paper develops discretization methods and Q-learning algorithms for infinite-horizon average-cost MDPs with continuous state and action spaces, providing convergence guarantees and near-optimality results.

Contribution

It introduces new discretization techniques and Q-learning algorithms for continuous spaces, with rigorous convergence analysis and error bounds under weaker continuity assumptions.

Findings

01

Discretization approximation with error bounds under weak and Wasserstein continuity.

02

Convergence of synchronous and asynchronous Q-learning algorithms in continuous spaces.

03

Near-optimality of solutions obtained via quantized Q-learning algorithms.

Abstract

For infinite-horizon average-cost criterion problems, there exist relatively few rigorous approximation and reinforcement learning results. In this paper, for Markov Decision Processes (MDPs) with standard Borel spaces, (i) we first provide a discretization based approximation method for MDPs with continuous spaces under average cost criteria, and provide error bounds for approximations when the dynamics are only weakly continuous (for asymptotic convergence of errors as the grid sizes vanish) or Wasserstein continuous (with a rate in approximation as the grid sizes vanish) under certain ergodicity assumptions. In particular, we relax the total variation condition given in prior work to weak continuity or Wasserstein continuity. (ii) We provide synchronous and asynchronous (quantized) Q-learning algorithms for continuous spaces via quantization (where the quantized state is taken to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Neurological disorders and treatments