Settling the Sample Complexity of Online Reinforcement Learning

Zihan Zhang; Yuxin Chen; Jason D. Lee; Simon S. Du

arXiv:2307.13586·cs.LG·April 30, 2025

Settling the Sample Complexity of Online Reinforcement Learning

Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon S. Du

PDF

Open Access

TL;DR

This paper proves that a modified model-based algorithm achieves minimax-optimal regret in finite-horizon online RL without burn-in costs, significantly advancing data efficiency and theoretical understanding.

Contribution

It establishes the first regret bounds matching minimax lower bounds for all sample sizes in finite-horizon RL, removing the burn-in requirement.

Findings

01

Achieves regret of order K, matching minimax lower bounds.

02

Provides a minimax-optimal PAC sample complexity of / ^2.

03

Develops new analysis techniques to handle statistical dependencies in online RL.

Abstract

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a ``large-sample'' regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory. We settle this problem for the context of finite-horizon inhomogeneous Markov decision processes. Specifically, we prove that a modified version of Monotonic Value Propagation (MVP), a model-based algorithm proposed by \cite{zhang2020reinforcement}, achieves a regret on the order of (modulo log factors) \begin{equation*} \min\big\{ \sqrt{SAH^3K}, \,HK \big\}, \end{equation*} where $S$ is the number of states, $A$ is the number of actions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Mental Health Research Topics