On Bellman equations for continuous-time policy evaluation I:   discretization and approximation

Wenlong Mou; Yuhua Zhu

arXiv:2407.05966·cs.LG·July 9, 2024

On Bellman equations for continuous-time policy evaluation I: discretization and approximation

Wenlong Mou, Yuhua Zhu

PDF

Open Access

TL;DR

This paper introduces new algorithms for continuous-time policy evaluation that leverage discretization schemes, achieving high accuracy and bounded approximation errors even with infinite horizons.

Contribution

The authors develop a novel class of discretization-based algorithms for continuous-time RL, providing error guarantees and compatibility with function approximation.

Findings

01

High-order numerical accuracy achieved

02

Bounded approximation error independent of horizon

03

Compatible with discrete-time RL with function approximation

Abstract

We study the problem of computing the value function from a discretely-observed trajectory of a continuous-time diffusion process. We develop a new class of algorithms based on easily implementable numerical schemes that are compatible with discrete-time reinforcement learning (RL) with function approximation. We establish high-order numerical accuracy as well as the approximation error guarantees for the proposed approach. In contrast to discrete-time RL problems where the approximation factor depends on the effective horizon, we obtain a bounded approximation factor using the underlying elliptic structures, even if the effective horizon diverges to infinity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Stochastic processes and financial applications

MethodsDiffusion