Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control

Alain Bensoussan; Thien P.B. Nguyen; Minh-Binh Tran; Son N.T. Tu

arXiv:2603.12167·math.OC·March 23, 2026

Operator Splitting, Policy Iteration, and Machine Learning for Stochastic Optimal Control

Alain Bensoussan, Thien P.B. Nguyen, Minh-Binh Tran, Son N.T. Tu

PDF

Open Access

TL;DR

This paper introduces a splitting method combined with policy iteration and machine learning to efficiently solve stochastic optimal control problems governed by Hamilton--Jacobi equations, with proven convergence and error bounds.

Contribution

It develops a novel splitting approach that reduces complex PDEs to simpler steps, integrating machine learning for value function approximation with established convergence guarantees.

Findings

01

Error bounds improve with data regularity

02

Method achieves exponential convergence in value learning

03

Numerical results demonstrate stability and accuracy

Abstract

We propose a splitting approach to solve the second-order Hamilton--Jacobi equation, reducing it to a heat step and a purely first-order step. The latter is implemented using a gradient value policy iteration algorithm, enabling efficient characteristic-based machine learning methods. We establish convergence rates for the splitting method. In particular, with $h$ the splitting step, the $L^{\infty}$ error is bounded between $O (h)$ and $O (h^{1/5})$ for Lipschitz data, improving to $O (h^{1/3})$ for semiconcave data. In the periodic setting, we also obtain an $L^{1}$ error of order $O (h^{1/2})$ . For the first-order step, we provide a weighted $L^{2}$ error analysis that shows exponential convergence. Each iteration solves linear characteristic equations and learns the value function by minimizing a weighted value gradient loss. The approach yields stable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Advanced Bandit Algorithms Research