Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes

Johannes M\"uller; Semih Cayci

arXiv:2406.04163·math.OC·December 16, 2025

Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes

Johannes M\"uller, Semih Cayci

PDF

Open Access

TL;DR

This paper establishes that entropy regularization in discounted Markov decision processes leads to errors that decrease exponentially with inverse regularization strength, providing precise convergence rates and insights into the implicit bias of natural policy gradient methods.

Contribution

It offers the first exponential convergence rate analysis for entropy regularization in MDPs, matching upper and lower bounds, and extends the analysis to general convex potentials.

Findings

01

Error decreases exponentially with inverse regularization strength.

02

Natural policy gradient methods exhibit exponential decay of error over iterations.

03

Extended analysis to general convex potentials and their natural policy gradients.

Abstract

We study the error introduced by entropy regularization in infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength, both in a weighted KL-divergence and in value with a problem-specific exponent. This is in contrast to previously known estimates, of the order $O (τ)$ , where $τ$ is the regularization strength. We provide a lower bound that matches our upper bound up to a polynomial term, thereby characterizing the exponential convergence rate for entropy regularization. Our proof relies on the observation that the solutions of entropy-regularized Markov decision processes solve a gradient flow of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of this gradient flow as the generalized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Statistical Methods and Inference · Distributed Sensor Networks and Detection Algorithms

MethodsEntropy Regularization