Exploratory HJB equations and their convergence
Wenpin Tang, Paul Yuming Zhang, Xun Yu Zhou

TL;DR
This paper analyzes the exploratory Hamilton-Jacobi-Bellman equation in continuous-time reinforcement learning, establishing well-posedness, regularity, and convergence results, and applies these to temperature control in simulated annealing.
Contribution
It proves the well-posedness and regularity of the viscosity solution to the exploratory HJB equation and demonstrates convergence to classical control as exploration diminishes, with explicit convergence rates.
Findings
Established well-posedness and regularity of the viscosity solution.
Proved convergence of exploratory control to classical control as exploration decays.
Derived explicit convergence rates for the temperature control problem.
Abstract
We study the exploratory Hamilton--Jacobi--Bellman (HJB) equation arising from the entropy-regularized exploratory control problem, which was formulated by Wang, Zariphopoulou and Zhou (J. Mach. Learn. Res., 21, 2020) in the context of reinforcement learning in continuous time and space. We establish the well-posedness and regularity of the viscosity solution to the equation, as well as the convergence of the exploratory control problem to the classical stochastic control problem when the level of exploration decays to zero. We then apply the general results to the exploratory temperature control problem, which was introduced by Gao, Xu and Zhou (arXiv:2005.04057, 2020) to design an endogenous temperature schedule for simulated annealing (SA) in the context of non-convex optimization. We derive an explicit rate of convergence for this problem as exploration diminishes to zero, and find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Biology Tumor Growth · Advanced Bandit Algorithms Research · Markov Chains and Monte Carlo Methods
