Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming
Dimitri P. Bertsekas

TL;DR
This paper introduces a unified framework connecting approximate Dynamic Programming, Model Predictive Control, and Reinforcement Learning, highlighting their shared algorithms and synergy through Newton's method.
Contribution
It presents a novel conceptual framework that links DP, MPC, and RL, emphasizing their common structure and the role of off-line and on-line algorithms in a unified setting.
Findings
Bridges the gap between RL and MPC through a shared algorithmic perspective.
Provides new insights into MPC stability and adaptability using this unified framework.
Highlights the benefits of Newton's method for performance bounds in control algorithms.
Abstract
In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the off-line training and the on-line play algorithms. The names are borrowed from some of the major successes of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess, [SHS17], [SSS17]), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon, [Tes94], [Tes95], [TeG96]). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Adaptive Dynamic Programming Control
MethodsDense Connections · Accumulating Eligibility Trace · AlphaZero · Feedforward Network · TD Lambda · TD-Gammon
