Exploration via Planning for Information about the Optimal Trajectory

Viraj Mehta; Ian Char; Joseph Abbate; Rory Conlin; Mark D.; Boyer; Stefano Ermon; Jeff Schneider; Willie Neiswanger

arXiv:2210.04642·cs.LG·October 11, 2022

Exploration via Planning for Information about the Optimal Trajectory

Viraj Mehta, Ian Char, Joseph Abbate, Rory Conlin, Mark D., Boyer, Stefano Ermon, Jeff Schneider, Willie Neiswanger

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a planning-based exploration method in reinforcement learning that efficiently reduces sample complexity by focusing on information gain about the optimal trajectory, outperforming existing methods.

Contribution

The authors develop a novel planning approach that considers both task objectives and current knowledge to guide exploration in RL, significantly reducing sample requirements.

Findings

01

Achieves 2x fewer samples than strong exploration baselines.

02

Reduces samples by 200x compared to model-free methods.

03

Effective on diverse low-to-medium dimensional control tasks.

Abstract

Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or robotics, where executing a policy in the environment is costly. In popular RL algorithms, agents typically explore either by adding stochasticity to a reward-maximizing policy or by attempting to gather maximal information about environment dynamics without taking the given task into account. In this work, we develop a method that allows us to plan for exploration while taking both the task and the current knowledge about the dynamics into account. The key insight to our approach is to plan an action sequence that maximizes the expected information gain about the optimal trajectory for the task at hand. We demonstrate that our method learns strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Exploration via Planning for Information about the Optimal Trajectory· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Data Stream Mining Techniques