TL;DR
Dream-MPC introduces a gradient-based model predictive control method that enhances policy performance in continuous control tasks by optimizing trajectories with a learned world model, outperforming existing approaches.
Contribution
It proposes a novel gradient-based MPC method that reuses optimized actions and incorporates uncertainty regularization, improving efficiency and effectiveness over prior gradient-free methods.
Findings
Dream-MPC outperforms gradient-free MPC on 24 control tasks.
It significantly improves the underlying policy's performance.
The approach is computationally more efficient for high-dimensional tasks.
Abstract
State-of-the-art model-based Reinforcement Learning (RL) approaches either use gradient-free, population-based methods for planning, learned policy networks, or a combination of policy networks and planning. Hybrid approaches that combine Model Predictive Control (MPC) with a learned model and a policy prior to leverage the advantages of both paradigms have shown promising results. However, these approaches typically rely on gradient-free optimization methods, which can be computationally expensive for high-dimensional control tasks. While gradient-based methods are a promising alternative, recent works have empirically shown that gradient-based methods often perform worse than their gradient-free counterparts. We propose Dream-MPC, a novel approach that generates few candidate trajectories from a rolled-out policy and optimizes each trajectory by gradient ascent using a learned world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
