TL;DR
MoPAC is a hybrid deep reinforcement learning method that combines model predictive control with policy optimization to improve robot skill acquisition efficiently while reducing model bias and physical interactions.
Contribution
The paper introduces MoPAC, a novel hybrid model-based/model-free reinforcement learning algorithm that enhances sample efficiency and reduces model bias for robot training.
Findings
MoPAC outperforms state-of-the-art methods in simulation tasks.
MoPAC successfully trains a physical robotic hand for complex manipulation tasks.
The approach reduces physical interactions needed during training.
Abstract
Substantial advancements to model-based reinforcement learning algorithms have been impeded by the model-bias induced by the collected data, which generally hurts performance. Meanwhile, their inherent sample efficiency warrants utility for most robot applications, limiting potential damage to the robot and its environment during training. Inspired by information theoretic model predictive control and advances in deep reinforcement learning, we introduce Model Predictive Actor-Critic (MoPAC), a hybrid model-based/model-free method that combines model predictive rollouts with policy optimization as to mitigate model bias. MoPAC leverages optimal trajectories to guide policy learning, but explores via its model-free method, allowing the algorithm to learn more expressive dynamics models. This combination guarantees optimal skill learning up to an approximation error and reduces necessary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
