Policy Iteration for Exploratory Hamilton--Jacobi--Bellman Equations

Hung Vinh Tran; Zhenhua Wang; Yuming Paul Zhang

arXiv:2406.00612·math.OC·May 28, 2025·1 cites

Policy Iteration for Exploratory Hamilton--Jacobi--Bellman Equations

Hung Vinh Tran, Zhenhua Wang, Yuming Paul Zhang

PDF

Open Access

TL;DR

This paper analyzes the convergence of policy iteration algorithms for entropy-regularized stochastic control problems with both bounded and unbounded coefficients, providing new theoretical guarantees and estimates.

Contribution

It offers the first convergence analysis of PIA for exploratory HJB equations with unbounded coefficients, including well-posedness and quantitative estimates.

Findings

01

Proved convergence of PIA with bounded coefficients under smallness conditions.

02

Established well-posedness of exploratory HJB with unbounded coefficients.

03

Provided quantitative $ ext{C}^{2,eta}$ and $ ext{C}^{1,eta}$ estimates for value sequences.

Abstract

We study the policy iteration algorithm (PIA) for entropy-regularized stochastic control problems on an infinite time horizon with a large discount rate, focusing on two main scenarios. First, we analyze PIA with bounded coefficients where the controls applied to the diffusion term satisfy a smallness condition. We demonstrate the convergence of PIA based on a uniform $C^{2, α}$ estimate for the value sequence generated by PIA, and provide a quantitative convergence analysis for this scenario. Second, we investigate PIA with unbounded coefficients but no control over the diffusion term. In this scenario, we first provide the well-posedness of the exploratory Hamilton--Jacobi--Bellman equation with linear growth coefficients and polynomial growth reward function. By such a well-posedess result we achieve PIA's convergence by establishing a quantitative locally uniform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Optimization and Variational Analysis