Temporal Difference Learning with Continuous Time and State in the Stochastic Setting
Ziad Kobeissi (SIERRA), Francis Bach (SIERRA, DI-ENS, PSL)

TL;DR
This paper introduces two new variants of TD(0) for continuous-time stochastic processes, providing theoretical convergence proofs and demonstrating their application to solving linear PDEs and BSDEs.
Contribution
The paper presents original model-free and model-based TD(0) methods with convergence guarantees for continuous-time policy evaluation in stochastic settings.
Findings
Proven convergence rates for both methods
Numerical simulations confirm theoretical results
Methods can approximate solutions to linear PDEs and BSDEs
Abstract
We consider the problem of continuous-time policy evaluation. This consists in learning through observations the value function associated with an uncontrolled continuous-time stochastic dynamic and a reward function. We propose two original variants of the well-known TD(0) method using vanishing time steps. One is model-free and the other is model-based. For both methods, we prove theoretical convergence rates that we subsequently verify through numerical simulations. Alternatively, those methods can be interpreted as novel reinforcement learning approaches for approximating solutions of linear PDEs (partial differential equations) or linear BSDEs (backward stochastic differential equations).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy Efficiency and Management · Simulation Techniques and Applications · Reinforcement Learning in Robotics
MethodsLinear Regression
