Learnings Options End-to-End for Continuous Action Tasks
Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup

TL;DR
This paper introduces an end-to-end learning approach for temporally extended actions in continuous tasks using the options framework, employing an option-critic architecture trained with proximal policy optimization, with promising results on Mujoco domains.
Contribution
It presents a novel end-to-end method for learning options in continuous tasks, integrating the option-critic architecture with PPO for improved training.
Findings
Promising results on Mujoco domains.
Raises questions about option selection and initiation sets.
Demonstrates effectiveness of the approach in continuous control.
Abstract
We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains arepromising, but lead to interesting questions aboutwhena given option should beused, an issue directly connected to the use of initiation sets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Auction Theory and Applications
