Convergence Analysis for Entropy-Regularized Control Problems: A Probabilistic Approach
Jin Ma, Gaozhan Wang, Jianfeng Zhang

TL;DR
This paper proves the convergence of the Policy Iteration Algorithm for entropy-regularized stochastic control problems using a probabilistic approach, achieving super-exponential rates in certain models and extending to diffusion control cases.
Contribution
It introduces a simple probabilistic proof for PIA convergence in continuous-time entropy-regularized control, avoiding complex PDE estimates and extending results to diffusion control in one dimension.
Findings
Proves PIA convergence with a simple probabilistic method.
Achieves super-exponential convergence rates in finite and infinite horizon models.
Extends convergence results to one-dimensional diffusion control cases.
Abstract
In this paper we investigate the convergence of the Policy Iteration Algorithm (PIA) for a class of general continuous-time entropy-regularized stochastic control problems. In particular, instead of employing sophisticated PDE estimates for the iterative PDEs involved in the algorithm (see, e.g., Huang-Wang-Zhou(2025)), we shall provide a simple proof from scratch for the convergence of the PIA. Our approach builds on probabilistic representation formulae for solutions of PDEs and their derivatives. Moreover, in the finite horizon model and in the infinite horizon model with large discount factor, the similar arguments lead to a super-exponential rate of convergence without tear. Finally, with some extra efforts we show that our approach can be extended to the diffusion control case in the one dimensional setting, also with a super-exponential rate of convergence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Research in Systems and Signal Processing
MethodsDiffusion
