Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments
Ali Devran Kara, Serdar Yuksel

TL;DR
This paper establishes a convergence theorem for Q-learning in complex, non-Markovian stochastic environments, broadening the understanding of reinforcement learning under general conditions and diverse applications.
Contribution
It provides the first convergence theorem for Q-learning in non-Markovian environments with detailed conditions and explores multiple applications in stochastic control and multi-agent systems.
Findings
Convergence conditions involve ergodicity and positivity.
Characterization of the limit of Q-learning iterates.
Applications to POMDPs, belief-MDPs, and multi-agent models.
Abstract
As a primary contribution, we present a convergence theorem for stochastic iterations, and in particular, Q-learning iterates, under a general, possibly non-Markovian, stochastic environment. Our conditions for convergence involve an ergodicity and a positivity criterion. We provide a precise characterization on the limit of the iterates and conditions on the environment and initializations for convergence. As our second contribution, we discuss the implications and applications of this theorem to a variety of stochastic control problems with non-Markovian environments involving (i) quantized approximations of fully observed Markov Decision Processes (MDPs) with continuous spaces (where quantization break down the Markovian structure), (ii) quantized approximations of belief-MDP reduced partially observable MDPS (POMDPs) with weak Feller continuity and a mild version of filter stability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Economic Policies and Impacts · Advanced Bandit Algorithms Research
MethodsQ-Learning
