Probabilistic Framework of Howard's Policy Iteration: BML Evaluation and Robust Convergence Analysis
Yutian Wang, Yuan-Hua Ni, Zengqiang Chen, Ji-Feng Zhang

TL;DR
This paper introduces a probabilistic FBSDE-based framework for Howard's policy iteration, enabling sample-based implementation, and provides convergence analysis including robustness under practical conditions.
Contribution
It develops a novel FBSDE-based formulation for policy iteration, incorporating BML criterion, and establishes convergence and robustness results.
Findings
FBSDE formulation is less sensitive to state dimension.
The BML criterion can recover existing methods like Deep BSDE.
Proved convergence and robustness of the proposed algorithms.
Abstract
This paper aims to build a probabilistic framework for Howard's policy iteration algorithm using the language of forward-backward stochastic differential equations (FBSDEs). As opposed to conventional formulations based on partial differential equations, our FBSDE-based formulation can be easily implemented by optimizing criteria over sample data, and is therefore less sensitive to the state dimension. In particular, both on-policy and off-policy evaluation methods are discussed by constructing different FBSDEs. The backward-measurability-loss (BML) criterion is then proposed for solving these equations. By choosing specific weight functions in the proposed criterion, we can recover the popular Deep BSDE method or the martingale approach for BSDEs. The convergence results are established under both ideal and practical conditions, depending on whether the optimization criteria are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy, Environment, and Transportation Policies · Energy Load and Power Forecasting · Monetary Policy and Economic Impact
