Reinforcement Learning for Discounted and Ergodic Control of Diffusion Processes
Erhan Bayraktar, Ali D. Kara, Somnath Pradhan, Serdar Yuksel

TL;DR
This paper introduces a quantized Q-learning algorithm for optimal control of diffusion processes, providing convergence guarantees and near-optimal policies for both discounted and ergodic cost criteria in continuous time.
Contribution
It develops a novel quantized Q-learning scheme with proven convergence for controlling diffusion processes under both cost criteria, extending reinforcement learning theory to continuous-time diffusions.
Findings
Proves near-optimality of finite-state MDP approximations.
Establishes almost-sure convergence of the Q-learning scheme.
Provides explicit near-optimality rates for diffusion control.
Abstract
This paper develops a quantized Q-learning algorithm for the optimal control of controlled diffusion processes on under both discounted and ergodic (average) cost criteria. We first establish near-optimality of finite-state MDP approximations to discrete-time discretizations of the diffusion, then introduce a quantized Q-learning scheme and prove its almost-sure convergence to near-optimal policies for the finite MDP. These policies, when interpolated to continuous time, are shown to be near-optimal for the original diffusion model under discounted costs and -- via a vanishing-discount argument -- also under ergodic costs for sufficiently small discount factors. The analysis applies under mild conditions (Lipschitz dynamics, non-degeneracy, bounded continuous costs, and Lyapunov stability for ergodic case) without requiring prior knowledge of the system dynamics or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Optimization and Variational Analysis
