Scalable Online Exploration via Coverability
Philip Amortila, Dylan J. Foster, Akshay Krishnamurthy

TL;DR
This paper introduces a new exploration objective, $L_1$-Coverage, that improves online exploration in reinforcement learning by controlling complexity, enabling efficient planning, and supporting scalable algorithms in high-dimensional MDPs.
Contribution
The paper proposes $L_1$-Coverage as a novel exploration objective that generalizes previous schemes and supports efficient, scalable algorithms for reinforcement learning in complex environments.
Findings
$L_1$-Coverage effectively guides exploration in high-dimensional MDPs.
The proposed algorithms are computationally efficient for online reinforcement learning.
Empirical results show successful exploration using $L_1$-Coverage with standard policy optimization methods.
Abstract
Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation. We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any reward function -- as a conceptual framework to systematize the study of exploration. Within this framework, we introduce a new objective, -Coverage, which generalizes previous exploration schemes and supports three fundamental desiderata: 1. Intrinsic complexity control. -Coverage is associated with a structural parameter, -Coverability, which reflects the intrinsic statistical difficulty of the underlying MDP, subsuming Block and Low-Rank MDPs. 2. Efficient planning. For a known MDP, optimizing -Coverage efficiently reduces to standard policy optimization, allowing flexible integration with off-the-shelf methods such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Optimization and Search Problems · Teaching and Learning Programming
MethodsQ-Learning
