Scalable Online Exploration via Coverability

Philip Amortila; Dylan J. Foster; Akshay Krishnamurthy

arXiv:2403.06571·cs.LG·June 6, 2024·1 cites

Scalable Online Exploration via Coverability

Philip Amortila, Dylan J. Foster, Akshay Krishnamurthy

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new exploration objective, $L_1$-Coverage, that improves online exploration in reinforcement learning by controlling complexity, enabling efficient planning, and supporting scalable algorithms in high-dimensional MDPs.

Contribution

The paper proposes $L_1$-Coverage as a novel exploration objective that generalizes previous schemes and supports efficient, scalable algorithms for reinforcement learning in complex environments.

Findings

01

$L_1$-Coverage effectively guides exploration in high-dimensional MDPs.

02

The proposed algorithms are computationally efficient for online reinforcement learning.

03

Empirical results show successful exploration using $L_1$-Coverage with standard policy optimization methods.

Abstract

Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation. We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any reward function -- as a conceptual framework to systematize the study of exploration. Within this framework, we introduce a new objective, $L_{1}$ -Coverage, which generalizes previous exploration schemes and supports three fundamental desiderata: 1. Intrinsic complexity control. $L_{1}$ -Coverage is associated with a structural parameter, $L_{1}$ -Coverability, which reflects the intrinsic statistical difficulty of the underlying MDP, subsuming Block and Low-Rank MDPs. 2. Efficient planning. For a known MDP, optimizing $L_{1}$ -Coverage efficiently reduces to standard policy optimization, allowing flexible integration with off-the-shelf methods such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

philip-amortila/l1-coverability
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Optimization and Search Problems · Teaching and Learning Programming

MethodsQ-Learning