On Bellman equations for continuous-time policy evaluation I: discretization and approximation
Wenlong Mou, Yuhua Zhu

TL;DR
This paper introduces new algorithms for continuous-time policy evaluation that leverage discretization schemes, achieving high accuracy and bounded approximation errors even with infinite horizons.
Contribution
The authors develop a novel class of discretization-based algorithms for continuous-time RL, providing error guarantees and compatibility with function approximation.
Findings
High-order numerical accuracy achieved
Bounded approximation error independent of horizon
Compatible with discrete-time RL with function approximation
Abstract
We study the problem of computing the value function from a discretely-observed trajectory of a continuous-time diffusion process. We develop a new class of algorithms based on easily implementable numerical schemes that are compatible with discrete-time reinforcement learning (RL) with function approximation. We establish high-order numerical accuracy as well as the approximation error guarantees for the proposed approach. In contrast to discrete-time RL problems where the approximation factor depends on the effective horizon, we obtain a bounded approximation factor using the underlying elliptic structures, even if the effective horizon diverges to infinity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Stochastic processes and financial applications
MethodsDiffusion
