Q-Learning for Stochastic Control under General Information Structures   and Non-Markovian Environments

Ali Devran Kara; Serdar Yuksel

arXiv:2311.00123·math.OC·March 5, 2024·6 cites

Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments

Ali Devran Kara, Serdar Yuksel

PDF

Open Access

TL;DR

This paper establishes a convergence theorem for Q-learning in complex, non-Markovian stochastic environments, broadening the understanding of reinforcement learning under general conditions and diverse applications.

Contribution

It provides the first convergence theorem for Q-learning in non-Markovian environments with detailed conditions and explores multiple applications in stochastic control and multi-agent systems.

Findings

01

Convergence conditions involve ergodicity and positivity.

02

Characterization of the limit of Q-learning iterates.

03

Applications to POMDPs, belief-MDPs, and multi-agent models.

Abstract

As a primary contribution, we present a convergence theorem for stochastic iterations, and in particular, Q-learning iterates, under a general, possibly non-Markovian, stochastic environment. Our conditions for convergence involve an ergodicity and a positivity criterion. We provide a precise characterization on the limit of the iterates and conditions on the environment and initializations for convergence. As our second contribution, we discuss the implications and applications of this theorem to a variety of stochastic control problems with non-Markovian environments involving (i) quantized approximations of fully observed Markov Decision Processes (MDPs) with continuous spaces (where quantization break down the Markovian structure), (ii) quantized approximations of belief-MDP reduced partially observable MDPS (POMDPs) with weak Feller continuity and a mild version of filter stability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Economic Policies and Impacts · Advanced Bandit Algorithms Research

MethodsQ-Learning