Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control
Alain Bensoussan, Thien P.B. Nguyen, Minh-Binh Tran, Son N.T. Tu

TL;DR
This paper introduces a splitting method combined with policy iteration and machine learning to efficiently solve stochastic optimal control problems governed by Hamilton--Jacobi equations, with proven convergence and error bounds.
Contribution
It develops a novel splitting approach that reduces complex PDEs to simpler steps, integrating machine learning for value function approximation with established convergence guarantees.
Findings
Error bounds improve with data regularity
Method achieves exponential convergence in value learning
Numerical results demonstrate stability and accuracy
Abstract
We propose a splitting approach to solve the second-order Hamilton--Jacobi equation, reducing it to a heat step and a purely first-order step. The latter is implemented using a gradient value policy iteration algorithm, enabling efficient characteristic-based machine learning methods. We establish convergence rates for the splitting method. In particular, with the splitting step, the error is bounded between and for Lipschitz data, improving to for semiconcave data. In the periodic setting, we also obtain an error of order . For the first-order step, we provide a weighted error analysis that shows exponential convergence. Each iteration solves linear characteristic equations and learns the value function by minimizing a weighted value gradient loss. The approach yields stable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Advanced Bandit Algorithms Research
