Iterative Option Discovery for Planning, by Planning
Kenny Young, Richard S. Sutton

TL;DR
This paper introduces Option Iteration, a method for discovering useful temporal abstractions called options in planning, which improves search efficiency and performance in complex environments by learning a set of locally effective policies.
Contribution
The paper proposes Option Iteration, an innovative approach inspired by Expert Iteration, for learning a set of options that enhance planning by covering diverse states and horizons.
Findings
Planning with learned options outperforms primitive action planning.
Option Iteration creates a set of locally strong policies.
Experimental results show significant improvements in complex environments.
Abstract
Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains. Building on the empirical success of the Expert Iteration approach to policy learning used in AlphaZero, we propose Option Iteration, an analogous approach to option discovery. Rather than learning a single strong policy that is trained to match the search results everywhere, Option Iteration learns a set of option policies trained such that for each state encountered, at least one policy in the set matches the search results for some horizon into the future. Intuitively, this may be significantly easier as it allows the algorithm to hedge its bets compared to learning a single globally strong policy, which may have complex dependencies on the details of the current state. Having learned such a set of locally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques
MethodsAlphaZero
