Blending MPC & Value Function Approximation for Efficient Reinforcement   Learning

Mohak Bhardwaj; Sanjiban Choudhury; Byron Boots

arXiv:2012.05909·cs.LG·April 15, 2021·1 cites

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots

PDF

Open Access 1 Video

TL;DR

This paper introduces a hybrid approach combining Model-Predictive Control and value function approximation via reinforcement learning, improving control performance under model bias and enhancing sample efficiency.

Contribution

It presents a novel framework that integrates MPC with RL by adjusting a parameter to balance model-based and learned value estimates, supported by theoretical analysis and empirical validation.

Findings

01

Achieves MPC-like performance with biased models

02

Reduces reliance on accurate models over time

03

More sample-efficient than pure model-free RL

Abstract

Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems that uses a model to make predictions about future behavior. For each state encountered, MPC solves an online optimization problem to choose a control action that will minimize future cost. This is a surprisingly effective strategy, but real-time performance requirements warrant the use of simple models. If the model is not sufficiently accurate, then the resulting controller can be biased, limiting performance. We present a framework for improving on MPC with model-free reinforcement learning (RL). The key insight is to view MPC as constructing a series of local Q-function approximations. We show that by using a parameter $λ$ , similar to the trace decay parameter in TD( $λ$ ), we can systematically trade-off learned value estimates against the local Q-function approximations. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Cardiovascular Function and Risk Factors · Advanced Control Systems Optimization