The Power of Resets in Online Reinforcement Learning
Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin

TL;DR
This paper demonstrates that local simulator access in online reinforcement learning enables efficient learning in complex environments under weaker assumptions than previously required, with theoretical guarantees and practical algorithms.
Contribution
It introduces new sample-efficient algorithms leveraging local simulator access for low coverability MDPs, expanding the theoretical understanding of RL in high-dimensional settings.
Findings
Efficient learning in low coverability MDPs with $Q^{ op}$-realizability.
Tractability of Exogenous Block MDPs under local simulator access.
Introduction of RVFS, a computationally efficient algorithm with provable guarantees.
Abstract
Simulators are a pervasive tool in reinforcement learning, but most existing algorithms cannot efficiently exploit simulator access -- particularly in high-dimensional domains that require general function approximation. We explore the power of simulators through online reinforcement learning with {local simulator access} (or, local planning), an RL protocol where the agent is allowed to reset to previously observed states and follow their dynamics during training. We use local simulator access to unlock new statistical guarantees that were previously out of reach: - We show that MDPs with low coverability (Xie et al. 2023) -- a general structural condition that subsumes Block MDPs and Low-Rank MDPs -- can be learned in a sample-efficient fashion with only -realizability (realizability of the optimal state-value function); existing online RL algorithms require significantly…
Peer Reviews
Decision·NeurIPS 2024 spotlight
Paper presents an extensive theoretical study.
Practical applications of the algorithm remain questionable. The modifications themselves might seem trivial.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Blockchain Technology Applications and Security · Auction Theory and Applications
