Q-Learning for Continuous State and Action MDPs under Average Cost Criteria
Ali Devran Kara, Serdar Yuksel

TL;DR
This paper develops discretization methods and Q-learning algorithms for infinite-horizon average-cost MDPs with continuous state and action spaces, providing convergence guarantees and near-optimality results.
Contribution
It introduces new discretization techniques and Q-learning algorithms for continuous spaces, with rigorous convergence analysis and error bounds under weaker continuity assumptions.
Findings
Discretization approximation with error bounds under weak and Wasserstein continuity.
Convergence of synchronous and asynchronous Q-learning algorithms in continuous spaces.
Near-optimality of solutions obtained via quantized Q-learning algorithms.
Abstract
For infinite-horizon average-cost criterion problems, there exist relatively few rigorous approximation and reinforcement learning results. In this paper, for Markov Decision Processes (MDPs) with standard Borel spaces, (i) we first provide a discretization based approximation method for MDPs with continuous spaces under average cost criteria, and provide error bounds for approximations when the dynamics are only weakly continuous (for asymptotic convergence of errors as the grid sizes vanish) or Wasserstein continuous (with a rate in approximation as the grid sizes vanish) under certain ergodicity assumptions. In particular, we relax the total variation condition given in prior work to weak continuity or Wasserstein continuity. (ii) We provide synchronous and asynchronous (quantized) Q-learning algorithms for continuous spaces via quantization (where the quantized state is taken to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Neurological disorders and treatments
