Reward-Free Exploration for Reinforcement Learning

Chi Jin; Akshay Krishnamurthy; Max Simchowitz; Tiancheng Yu

arXiv:2002.02794·cs.LG·February 10, 2020·26 cites

Reward-Free Exploration for Reinforcement Learning

Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

PDF

Open Access 1 Video

TL;DR

This paper introduces a reward-free exploration framework in reinforcement learning, enabling efficient policy computation for multiple reward functions after a single exploration phase, addressing the challenge of exploration in complex environments.

Contribution

The paper proposes a novel reward-free RL framework and an efficient exploration algorithm with near-optimal sample complexity, applicable to multiple reward functions without prior reward specification.

Findings

01

Achieves exploration with (S^2A poly(H))/psilon^2 episodes

02

Provides a nearly-matching lower bound on sample complexity

03

Compatible with black-box planning algorithms like value iteration

Abstract

Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new "reward-free RL" framework. In the exploration phase, the agent first collects trajectories from an MDP $M$ without a pre-specified reward function. After exploration, it is tasked with computing near-optimal policies under for $M$ for a collection of given reward functions. This framework is particularly suitable when there are many reward functions of interest, or when the reward function is shaped by an external agent to elicit desired behavior. We give an efficient algorithm that conducts $\tilde{O} (S^{2} A poly (H) / ϵ^{2})$ episodes of exploration and returns $ϵ$ -suboptimal policies for an arbitrary number…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reward-Free Exploration for Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research