Equivalence of Optimality Criteria for Markov Decision Process and Model Predictive Control
Arash Bahari Kordabad, Mario Zanon, Sebastien Gros

TL;DR
This paper demonstrates that optimal policies and value functions in MDPs can be represented by finite-horizon optimal control problems, enabling parameterization and tuning of MPC schemes via reinforcement learning.
Contribution
It establishes a theoretical equivalence between MDP optimality criteria and finite-horizon OCPs, including MPC, allowing for model-based tuning using RL.
Findings
Analytical verification in LQR case
Simulation results on nonlinear examples
Parameterization of MPC schemes for RL tuning
Abstract
This paper shows that the optimal policy and value functions of a Markov Decision Process (MDP), either discounted or not, can be captured by a finite-horizon undiscounted Optimal Control Problem (OCP), even if based on an inexact model. This can be achieved by selecting a proper stage cost and terminal cost for the OCP. A very useful particular case of OCP is a Model Predictive Control (MPC) scheme where a deterministic (possibly nonlinear) model is used to reduce the computational complexity. This observation leads us to parameterize an MPC scheme fully, including the cost function. In practice, Reinforcement Learning algorithms can then be used to tune the parameterized MPC scheme. We verify the developed theorems analytically in an LQR case and we investigate some other nonlinear examples in simulations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics · Formal Methods in Verification
