Iterative Option Discovery for Planning, by Planning

Kenny Young; Richard S. Sutton

arXiv:2310.01569·cs.AI·December 27, 2023

Iterative Option Discovery for Planning, by Planning

Kenny Young, Richard S. Sutton

PDF

Open Access

TL;DR

This paper introduces Option Iteration, a method for discovering useful temporal abstractions called options in planning, which improves search efficiency and performance in complex environments by learning a set of locally effective policies.

Contribution

The paper proposes Option Iteration, an innovative approach inspired by Expert Iteration, for learning a set of options that enhance planning by covering diverse states and horizons.

Findings

01

Planning with learned options outperforms primitive action planning.

02

Option Iteration creates a set of locally strong policies.

03

Experimental results show significant improvements in complex environments.

Abstract

Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains. Building on the empirical success of the Expert Iteration approach to policy learning used in AlphaZero, we propose Option Iteration, an analogous approach to option discovery. Rather than learning a single strong policy that is trained to match the search results everywhere, Option Iteration learns a set of option policies trained such that for each state encountered, at least one policy in the set matches the search results for some horizon into the future. Intuitively, this may be significantly easier as it allows the algorithm to hedge its bets compared to learning a single globally strong policy, which may have complex dependencies on the details of the current state. Having learned such a set of locally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques

MethodsAlphaZero