Policy Iteration for Exploratory Hamilton--Jacobi--Bellman Equations
Hung Vinh Tran, Zhenhua Wang, Yuming Paul Zhang

TL;DR
This paper analyzes the convergence of policy iteration algorithms for entropy-regularized stochastic control problems with both bounded and unbounded coefficients, providing new theoretical guarantees and estimates.
Contribution
It offers the first convergence analysis of PIA for exploratory HJB equations with unbounded coefficients, including well-posedness and quantitative estimates.
Findings
Proved convergence of PIA with bounded coefficients under smallness conditions.
Established well-posedness of exploratory HJB with unbounded coefficients.
Provided quantitative $ ext{C}^{2,eta}$ and $ ext{C}^{1,eta}$ estimates for value sequences.
Abstract
We study the policy iteration algorithm (PIA) for entropy-regularized stochastic control problems on an infinite time horizon with a large discount rate, focusing on two main scenarios. First, we analyze PIA with bounded coefficients where the controls applied to the diffusion term satisfy a smallness condition. We demonstrate the convergence of PIA based on a uniform estimate for the value sequence generated by PIA, and provide a quantitative convergence analysis for this scenario. Second, we investigate PIA with unbounded coefficients but no control over the diffusion term. In this scenario, we first provide the well-posedness of the exploratory Hamilton--Jacobi--Bellman equation with linear growth coefficients and polynomial growth reward function. By such a well-posedess result we achieve PIA's convergence by establishing a quantitative locally uniform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Optimization and Variational Analysis
