The Power of Resets in Online Reinforcement Learning

Zakaria Mhammedi; Dylan J. Foster; Alexander Rakhlin

arXiv:2404.15417·cs.LG·April 29, 2024

The Power of Resets in Online Reinforcement Learning

Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin

PDF

Open Access 1 Reviews

TL;DR

This paper demonstrates that local simulator access in online reinforcement learning enables efficient learning in complex environments under weaker assumptions than previously required, with theoretical guarantees and practical algorithms.

Contribution

It introduces new sample-efficient algorithms leveraging local simulator access for low coverability MDPs, expanding the theoretical understanding of RL in high-dimensional settings.

Findings

01

Efficient learning in low coverability MDPs with $Q^{ op}$-realizability.

02

Tractability of Exogenous Block MDPs under local simulator access.

03

Introduction of RVFS, a computationally efficient algorithm with provable guarantees.

Abstract

Simulators are a pervasive tool in reinforcement learning, but most existing algorithms cannot efficiently exploit simulator access -- particularly in high-dimensional domains that require general function approximation. We explore the power of simulators through online reinforcement learning with {local simulator access} (or, local planning), an RL protocol where the agent is allowed to reset to previously observed states and follow their dynamics during training. We use local simulator access to unlock new statistical guarantees that were previously out of reach: - We show that MDPs with low coverability (Xie et al. 2023) -- a general structural condition that subsumes Block MDPs and Low-Rank MDPs -- can be learned in a sample-efficient fashion with only $Q^{⋆}$ -realizability (realizability of the optimal state-value function); existing online RL algorithms require significantly…

Peer Reviews

Decision·NeurIPS 2024 spotlight

Reviewer 01Rating 4Confidence 1

Strengths

Paper presents an extensive theoretical study.

Weaknesses

Practical applications of the algorithm remain questionable. The modifications themselves might seem trivial.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Blockchain Technology Applications and Security · Auction Theory and Applications