Policy Optimization for Unknown Systems using Differentiable Model Predictive Control
Riccardo Zuliani, Efe C. Balta, John Lygeros

TL;DR
This paper proposes a new policy optimization framework for MPC that combines differentiable and zeroth-order optimization, improving performance under model uncertainty in nonlinear control tasks.
Contribution
It introduces a novel approach integrating differentiable optimization with zeroth-order methods for MPC policy training, enhancing robustness and convergence.
Findings
Faster transient performance compared to fully data-driven methods
Maintains convergence guarantees under model uncertainty
Effective on a 12-dimensional quadcopter control task
Abstract
Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
