Temporal Difference Learning with Continuous Time and State in the   Stochastic Setting

Ziad Kobeissi (SIERRA); Francis Bach (SIERRA; DI-ENS; PSL)

arXiv:2202.07960·cs.LG·June 8, 2023

Temporal Difference Learning with Continuous Time and State in the Stochastic Setting

Ziad Kobeissi (SIERRA), Francis Bach (SIERRA, DI-ENS, PSL)

PDF

Open Access

TL;DR

This paper introduces two new variants of TD(0) for continuous-time stochastic processes, providing theoretical convergence proofs and demonstrating their application to solving linear PDEs and BSDEs.

Contribution

The paper presents original model-free and model-based TD(0) methods with convergence guarantees for continuous-time policy evaluation in stochastic settings.

Findings

01

Proven convergence rates for both methods

02

Numerical simulations confirm theoretical results

03

Methods can approximate solutions to linear PDEs and BSDEs

Abstract

We consider the problem of continuous-time policy evaluation. This consists in learning through observations the value function associated with an uncontrolled continuous-time stochastic dynamic and a reward function. We propose two original variants of the well-known TD(0) method using vanishing time steps. One is model-free and the other is model-based. For both methods, we prove theoretical convergence rates that we subsequently verify through numerical simulations. Alternatively, those methods can be interpreted as novel reinforcement learning approaches for approximating solutions of linear PDEs (partial differential equations) or linear BSDEs (backward stochastic differential equations).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEnergy Efficiency and Management · Simulation Techniques and Applications · Reinforcement Learning in Robotics

MethodsLinear Regression