Equivalence of Optimality Criteria for Markov Decision Process and Model   Predictive Control

Arash Bahari Kordabad; Mario Zanon; Sebastien Gros

arXiv:2210.04302·eess.SY·February 8, 2023

Equivalence of Optimality Criteria for Markov Decision Process and Model Predictive Control

Arash Bahari Kordabad, Mario Zanon, Sebastien Gros

PDF

Open Access

TL;DR

This paper demonstrates that optimal policies and value functions in MDPs can be represented by finite-horizon optimal control problems, enabling parameterization and tuning of MPC schemes via reinforcement learning.

Contribution

It establishes a theoretical equivalence between MDP optimality criteria and finite-horizon OCPs, including MPC, allowing for model-based tuning using RL.

Findings

01

Analytical verification in LQR case

02

Simulation results on nonlinear examples

03

Parameterization of MPC schemes for RL tuning

Abstract

This paper shows that the optimal policy and value functions of a Markov Decision Process (MDP), either discounted or not, can be captured by a finite-horizon undiscounted Optimal Control Problem (OCP), even if based on an inexact model. This can be achieved by selecting a proper stage cost and terminal cost for the OCP. A very useful particular case of OCP is a Model Predictive Control (MPC) scheme where a deterministic (possibly nonlinear) model is used to reduce the computational complexity. This observation leads us to parameterize an MPC scheme fully, including the cost function. In practice, Reinforcement Learning algorithms can then be used to tune the parameterized MPC scheme. We verify the developed theorems analytically in an LQR case and we investigate some other nonlinear examples in simulations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics · Formal Methods in Verification