Active Coverage for PAC Reinforcement Learning

Aymen Al-Marjani; Andrea Tirinzoni; Emilie Kaufmann

arXiv:2306.13601·cs.LG·June 26, 2023

Active Coverage for PAC Reinforcement Learning

Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann

PDF

Open Access

TL;DR

This paper introduces a formal framework for active coverage in reinforcement learning, providing theoretical bounds and a versatile algorithm that improves exploration efficiency and policy identification in various MDPs.

Contribution

It formalizes active coverage in episodic MDPs, derives an instance-dependent lower bound, and proposes CovGame, a nearly optimal algorithm adaptable to multiple PAC RL tasks.

Findings

01

CovGame nearly matches the lower bound on sample complexity.

02

The exploration algorithm outperforms minimax bounds in easy-to-explore MDPs.

03

The combined approach yields a computationally-efficient best-policy identification method.

Abstract

Collecting and leveraging data with good coverage properties plays a crucial role in different aspects of reinforcement learning (RL), including reward-free exploration and offline learning. However, the notion of "good coverage" really depends on the application at hand, as data suitable for one context may not be so for another. In this paper, we formalize the problem of active coverage in episodic Markov decision processes (MDPs), where the goal is to interact with the environment so as to fulfill given sampling requirements. This framework is sufficiently flexible to specify any desired coverage property, making it applicable to any problem that involves online exploration. Our main contribution is an instance-dependent lower bound on the sample complexity of active coverage and a simple game-theoretic algorithm, CovGame, that nearly matches it. We then show that CovGame can be used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems