q-Learning in Continuous Time

Yanwei Jia; Xun Yu Zhou

arXiv:2207.00713·cs.LG·May 7, 2025

q-Learning in Continuous Time

Yanwei Jia, Xun Yu Zhou

PDF

Open Access

TL;DR

This paper extends Q-learning to continuous-time reinforcement learning using a new q-function approximation, developing theory and algorithms that are independent of time discretization, and demonstrating their effectiveness through simulations.

Contribution

It introduces a continuous-time q-learning framework with martingale-based characterization, unifying actor-critic algorithms and connecting to existing methods like SARSA and policy gradients.

Findings

01

The proposed algorithms perform competitively with existing PG-based methods.

02

The continuous-time q-learning approach is robust to time discretization.

03

Simulation results show improved convergence properties.

Abstract

We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ``(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a ``q-learning" theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control

MethodsDiffusion · Q-Learning