Reward-Free Exploration for Reinforcement Learning
Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

TL;DR
This paper introduces a reward-free exploration framework in reinforcement learning, enabling efficient policy computation for multiple reward functions after a single exploration phase, addressing the challenge of exploration in complex environments.
Contribution
The paper proposes a novel reward-free RL framework and an efficient exploration algorithm with near-optimal sample complexity, applicable to multiple reward functions without prior reward specification.
Findings
Achieves exploration with (S^2A poly(H))/psilon^2 episodes
Provides a nearly-matching lower bound on sample complexity
Compatible with black-box planning algorithms like value iteration
Abstract
Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new "reward-free RL" framework. In the exploration phase, the agent first collects trajectories from an MDP without a pre-specified reward function. After exploration, it is tasked with computing near-optimal policies under for for a collection of given reward functions. This framework is particularly suitable when there are many reward functions of interest, or when the reward function is shaped by an external agent to elicit desired behavior. We give an efficient algorithm that conducts episodes of exploration and returns -suboptimal policies for an arbitrary number…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
