Blending MPC & Value Function Approximation for Efficient Reinforcement Learning
Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots

TL;DR
This paper introduces a hybrid approach combining Model-Predictive Control and value function approximation via reinforcement learning, improving control performance under model bias and enhancing sample efficiency.
Contribution
It presents a novel framework that integrates MPC with RL by adjusting a parameter to balance model-based and learned value estimates, supported by theoretical analysis and empirical validation.
Findings
Achieves MPC-like performance with biased models
Reduces reliance on accurate models over time
More sample-efficient than pure model-free RL
Abstract
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems that uses a model to make predictions about future behavior. For each state encountered, MPC solves an online optimization problem to choose a control action that will minimize future cost. This is a surprisingly effective strategy, but real-time performance requirements warrant the use of simple models. If the model is not sufficiently accurate, then the resulting controller can be biased, limiting performance. We present a framework for improving on MPC with model-free reinforcement learning (RL). The key insight is to view MPC as constructing a series of local Q-function approximations. We show that by using a parameter , similar to the trace decay parameter in TD(), we can systematically trade-off learned value estimates against the local Q-function approximations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Cardiovascular Function and Risk Factors · Advanced Control Systems Optimization
