Flexible Option Learning
Martin Klissarov, Doina Precup

TL;DR
This paper extends intra-option learning in deep reinforcement learning to update all consistent options simultaneously, improving efficiency and performance in hierarchical RL frameworks.
Contribution
It introduces a method to update multiple options at once in deep RL, enhancing hierarchical learning without extra estimates.
Findings
Significant performance improvements across various domains.
Enhanced data efficiency in hierarchical RL.
Compatibility with existing option-critic algorithms.
Abstract
Temporal abstraction in reinforcement learning (RL), offers the promise of improving generalization and knowledge transfer in complex environments, by propagating information more efficiently over time. Although option learning was initially formulated in a way that allows updating many options simultaneously, using off-policy, intra-option learning (Sutton, Precup & Singh, 1999), many of the recent hierarchical reinforcement learning approaches only update a single option at a time: the option currently executing. We revisit and extend intra-option learning in the context of deep reinforcement learning, in order to enable updating all options consistent with current primitive action choices, without introducing any additional estimates. Our method can therefore be naturally adopted in most hierarchical RL frameworks. When we combine our approach with the option-critic algorithm for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReservoir Engineering and Simulation Methods · Advanced Bandit Algorithms Research
