Combining Model-Based and Model-Free Updates for Trajectory-Centric   Reinforcement Learning

Yevgen Chebotar; Karol Hausman; Marvin Zhang; Gaurav Sukhatme; Stefan; Schaal; Sergey Levine

arXiv:1703.03078·cs.RO·June 20, 2017·86 cites

Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning

Yevgen Chebotar, Karol Hausman, Marvin Zhang, Gaurav Sukhatme, Stefan, Schaal, Sergey Levine

PDF

Open Access

TL;DR

This paper presents a novel approach that combines model-based and model-free reinforcement learning techniques, enabling efficient and effective policy learning for robotic manipulation tasks.

Contribution

The authors develop a unified framework integrating LQR-based model-based updates with path integral policy improvement, extendable to deep neural policies via guided policy search.

Findings

01

Achieves high sample efficiency comparable to model-based methods

02

Demonstrates superior performance on complex manipulation tasks

03

Validates effectiveness through both simulation and real-world experiments

Abstract

Reinforcement learning (RL) algorithms for real-world robotic applications need a data-efficient learning process and the ability to handle complex, unknown dynamical systems. These requirements are handled well by model-based and model-free RL approaches, respectively. In this work, we aim to combine the advantages of these two types of methods in a principled manner. By focusing on time-varying linear-Gaussian policies, we enable a model-based algorithm based on the linear quadratic regulator (LQR) that can be integrated into the model-free framework of path integral policy improvement (PI2). We can further combine our method with guided policy search (GPS) to train arbitrary parameterized policies such as deep neural networks. Our simulation and real-world experiments demonstrate that this method can solve challenging manipulation tasks with comparable or better performance than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Autonomous Vehicle Technology and Safety