KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

Tadashi Kozuno; Wenhao Yang; Nino Vieillard; Toshinori Kitamura,; Yunhao Tang; Jincheng Mei; Pierre M\'enard; Mohammad Gheshlaghi Azar; Michal; Valko; R\'emi Munos; Olivier Pietquin; Matthieu Geist; Csaba Szepesv\'ari

arXiv:2205.14211·cs.LG·May 31, 2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura,, Yunhao Tang, Jincheng Mei, Pierre M\'enard, Mohammad Gheshlaghi Azar, Michal, Valko, R\'emi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesv\'ari

PDF

Open Access 1 Datasets

TL;DR

This paper proves that a simple, model-free reinforcement learning algorithm using KL-entropy regularization is nearly minimax-optimal in sample complexity, providing theoretical guarantees for its efficiency in finding near-optimal policies.

Contribution

It offers the first theoretical proof that a straightforward model-free RL method with entropy regularization achieves nearly minimax-optimal sample complexity.

Findings

01

Mirror descent value iteration is nearly minimax-optimal for small ε.

02

The analysis applies to algorithms without variance reduction.

03

First theoretical guarantee for simple model-free methods with entropy regularization.

Abstract

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for finding an $ε$ -optimal policy when $ε$ is sufficiently small. This is the first theoretical result that demonstrates that a simple model-free algorithm without variance-reduction can be nearly minimax-optimal under the considered setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

misovalko/my-research-papers
dataset· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms

MethodsEntropy Regularization