Improving the Exploration of Deep Reinforcement Learning in Continuous   Domains using Planning for Policy Search

Jakob J. Hollenstein; Erwan Renaudo; Matteo Saveriano; Justus Piater

arXiv:2010.12974·cs.LG·October 27, 2020·1 cites

Improving the Exploration of Deep Reinforcement Learning in Continuous Domains using Planning for Policy Search

Jakob J. Hollenstein, Erwan Renaudo, Matteo Saveriano, Justus Piater

PDF

Open Access

TL;DR

This paper introduces PPS, a model-based reinforcement learning method that integrates kinodynamic planning to enhance exploration and policy discovery in continuous domains, outperforming traditional D-RL methods.

Contribution

The paper presents PPS, a novel approach combining kinodynamic planning with offline policy learning to improve exploration and policy quality in continuous RL tasks.

Findings

01

PPS explores a wider state space than standard D-RL methods.

02

PPS discovers better policies through more diverse training data.

03

PPS outperforms state-of-the-art D-RL in underactuated systems.

Abstract

Local policy search is performed by most Deep Reinforcement Learning (D-RL) methods, which increases the risk of getting trapped in a local minimum. Furthermore, the availability of a simulation model is not fully exploited in D-RL even in simulation-based training, which potentially decreases efficiency. To better exploit simulation models in policy search, we propose to integrate a kinodynamic planner in the exploration strategy and to learn a control policy in an offline fashion from the generated environment interactions. We call the resulting model-based reinforcement learning method PPS (Planning for Policy Search). We compare PPS with state-of-the-art D-RL methods in typical RL settings including underactuated systems. The comparison shows that PPS, guided by the kinodynamic planner, collects data from a wider region of the state space. This generates training data that helps PPS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robotic Path Planning Algorithms · Adversarial Robustness in Machine Learning