Entropy annealing for policy mirror descent in continuous time and space
Deven Sethi, David \v{S}i\v{s}ka, Yufei Zhang

TL;DR
This paper analyzes how entropy regularization affects the convergence of policy gradient methods in continuous-time stochastic control, showing that annealing entropy can lead to optimal solutions with quantifiable rates.
Contribution
It introduces a continuous-time policy mirror descent framework with entropy annealing, providing convergence guarantees for both regularized and unregularized problems.
Findings
Fixed entropy levels lead to exponential convergence to regularized optima.
Decaying entropy levels enable convergence to unregularized solutions at specific rates.
Analysis extends to infinite-dimensional spaces of Markov kernels.
Abstract
Entropy regularization has been widely used in policy optimization algorithms to enhance exploration and the robustness of the optimal control; however it also introduces an additional regularization bias. This work quantifies the impact of entropy regularization on the convergence of policy gradient methods for stochastic exit time control problems. We analyze a continuous-time policy mirror descent dynamics, which updates the policy based on the gradient of an entropy-regularized value function and adjusts the strength of entropy regularization as the algorithm progresses. We prove that with a fixed entropy level, the mirror descent dynamics converges exponentially to the optimal solution of the regularized problem. We further show that when the entropy level decays at suitable polynomial rates, the annealed flow converges to the solution of the unregularized problem at a rate of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic Policies and Impacts · Fiscal Policies and Political Economy
MethodsEntropy Regularization
