Improving the Exploration of Deep Reinforcement Learning in Continuous Domains using Planning for Policy Search
Jakob J. Hollenstein, Erwan Renaudo, Matteo Saveriano, Justus Piater

TL;DR
This paper introduces PPS, a model-based reinforcement learning method that integrates kinodynamic planning to enhance exploration and policy discovery in continuous domains, outperforming traditional D-RL methods.
Contribution
The paper presents PPS, a novel approach combining kinodynamic planning with offline policy learning to improve exploration and policy quality in continuous RL tasks.
Findings
PPS explores a wider state space than standard D-RL methods.
PPS discovers better policies through more diverse training data.
PPS outperforms state-of-the-art D-RL in underactuated systems.
Abstract
Local policy search is performed by most Deep Reinforcement Learning (D-RL) methods, which increases the risk of getting trapped in a local minimum. Furthermore, the availability of a simulation model is not fully exploited in D-RL even in simulation-based training, which potentially decreases efficiency. To better exploit simulation models in policy search, we propose to integrate a kinodynamic planner in the exploration strategy and to learn a control policy in an offline fashion from the generated environment interactions. We call the resulting model-based reinforcement learning method PPS (Planning for Policy Search). We compare PPS with state-of-the-art D-RL methods in typical RL settings including underactuated systems. The comparison shows that PPS, guided by the kinodynamic planner, collects data from a wider region of the state space. This generates training data that helps PPS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Path Planning Algorithms · Adversarial Robustness in Machine Learning
