Value Function Estimators for Feynman-Kac Forward-Backward SDEs in Stochastic Optimal Control
Kelsey P. Hawkins, Ali Pakniyat, Panagiotis Tsiotras

TL;DR
This paper introduces two new numerical estimators for Feynman-Kac FBSDEs in stochastic optimal control, achieving higher accuracy and stability than existing methods, with applications to reinforcement learning.
Contribution
The paper presents a novel discrete-time approximation approach for FBSDEs that improves accuracy and stability over traditional discretization methods.
Findings
Significant accuracy improvements over Euler-Maruyama estimators.
Near machine-precision accuracy in linear quadratic regulator problems.
Enhanced stability preventing divergence in complex control problems.
Abstract
Two novel numerical estimators are proposed for solving forward-backward stochastic differential equations (FBSDEs) appearing in the Feynman-Kac representation of the value function in stochastic optimal control problems. In contrast to the current numerical approaches which are based on the discretization of the continuous-time FBSDE, we propose a converse approach, namely, we obtain a discrete-time approximation of the on-policy value function, and then we derive a discrete-time estimator that resembles the continuous-time counterpart. The proposed approach allows for the construction of higher accuracy estimators along with error analysis. The approach is applied to the policy improvement step in reinforcement learning. Numerical results and error analysis are demonstrated using (i) a scalar nonlinear stochastic optimal control problem and (ii) a four-dimensional linear quadratic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Stochastic processes and financial applications · Model Reduction and Neural Networks
