A random measure approach to reinforcement learning in continuous time
Christian Bender, Nguyen Tran Thuan

TL;DR
This paper introduces a novel random measure framework for modeling exploration in continuous-time reinforcement learning, enabling better theoretical analysis and algorithm development for controlled diffusion and jump processes.
Contribution
It develops a limit theorem for grid-sampled random measures, connecting discrete sampling to continuous-time models in RL with jumps and diffusion.
Findings
Proves convergence of grid-sampled measures to a limit SDE.
Shows the limit SDE can replace existing exploratory models.
Provides a foundation for analyzing and designing RL algorithms in continuous time.
Abstract
We present a random measure approach for modeling exploration, i.e., the execution of measure-valued controls, in continuous-time reinforcement learning (RL) with controlled diffusion and jumps. First, we consider the case when sampling the randomized control in continuous time takes place on a discrete-time grid and reformulate the resulting stochastic differential equation (SDE) as an equation driven by suitable random measures. The construction of these random measures makes use of the Brownian motion and the Poisson random measure (which are the sources of noise in the original model dynamics) as well as the additional random variables, which are sampled on the grid for the control execution. Then, we prove a limit theorem for these random measures as the mesh-size of the sampling grid goes to zero, which leads to the grid-sampling limit SDE that is jointly driven by white noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics
MethodsDiffusion
