Learnings Options End-to-End for Continuous Action Tasks

Martin Klissarov; Pierre-Luc Bacon; Jean Harb; Doina Precup

arXiv:1712.00004·cs.LG·December 4, 2017·33 cites

Learnings Options End-to-End for Continuous Action Tasks

Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup

PDF

Open Access 3 Repos

TL;DR

This paper introduces an end-to-end learning approach for temporally extended actions in continuous tasks using the options framework, employing an option-critic architecture trained with proximal policy optimization, with promising results on Mujoco domains.

Contribution

It presents a novel end-to-end method for learning options in continuous tasks, integrating the option-critic architecture with PPO for improved training.

Findings

01

Promising results on Mujoco domains.

02

Raises questions about option selection and initiation sets.

03

Demonstrates effectiveness of the approach in continuous control.

Abstract

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains arepromising, but lead to interesting questions aboutwhena given option should beused, an issue directly connected to the use of initiation sets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Auction Theory and Applications