Model Predictive Control and Reinforcement Learning: A Unified Framework   Based on Dynamic Programming

Dimitri P. Bertsekas

arXiv:2406.00592·eess.SY·July 2, 2024·2 cites

Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming

Dimitri P. Bertsekas

PDF

Open Access

TL;DR

This paper introduces a unified framework connecting approximate Dynamic Programming, Model Predictive Control, and Reinforcement Learning, highlighting their shared algorithms and synergy through Newton's method.

Contribution

It presents a novel conceptual framework that links DP, MPC, and RL, emphasizing their common structure and the role of off-line and on-line algorithms in a unified setting.

Findings

01

Bridges the gap between RL and MPC through a shared algorithmic perspective.

02

Provides new insights into MPC stability and adaptability using this unified framework.

03

Highlights the benefits of Newton's method for performance bounds in control algorithms.

Abstract

In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the off-line training and the on-line play algorithms. The names are borrowed from some of the major successes of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess, [SHS17], [SSS17]), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon, [Tes94], [Tes95], [TeG96]). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Adaptive Dynamic Programming Control

MethodsDense Connections · Accumulating Eligibility Trace · AlphaZero · Feedforward Network · TD Lambda · TD-Gammon