Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
Johannes M\"uller, Semih Cayci

TL;DR
This paper establishes that entropy regularization in discounted Markov decision processes leads to errors that decrease exponentially with inverse regularization strength, providing precise convergence rates and insights into the implicit bias of natural policy gradient methods.
Contribution
It offers the first exponential convergence rate analysis for entropy regularization in MDPs, matching upper and lower bounds, and extends the analysis to general convex potentials.
Findings
Error decreases exponentially with inverse regularization strength.
Natural policy gradient methods exhibit exponential decay of error over iterations.
Extended analysis to general convex potentials and their natural policy gradients.
Abstract
We study the error introduced by entropy regularization in infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength, both in a weighted KL-divergence and in value with a problem-specific exponent. This is in contrast to previously known estimates, of the order , where is the regularization strength. We provide a lower bound that matches our upper bound up to a polynomial term, thereby characterizing the exponential convergence rate for entropy regularization. Our proof relies on the observation that the solutions of entropy-regularized Markov decision processes solve a gradient flow of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of this gradient flow as the generalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Statistical Methods and Inference · Distributed Sensor Networks and Detection Algorithms
MethodsEntropy Regularization
