Policy Optimization for Unknown Systems using Differentiable Model Predictive Control

Riccardo Zuliani; Efe C. Balta; John Lygeros

arXiv:2511.11308·eess.SY·April 21, 2026

Policy Optimization for Unknown Systems using Differentiable Model Predictive Control

Riccardo Zuliani, Efe C. Balta, John Lygeros

PDF

TL;DR

This paper proposes a new policy optimization framework for MPC that combines differentiable and zeroth-order optimization, improving performance under model uncertainty in nonlinear control tasks.

Contribution

It introduces a novel approach integrating differentiable optimization with zeroth-order methods for MPC policy training, enhancing robustness and convergence.

Findings

01

Faster transient performance compared to fully data-driven methods

02

Maintains convergence guarantees under model uncertainty

03

Effective on a 12-dimensional quadcopter control task

Abstract

Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.