Soft MPCritic: Amortized Model Predictive Value Iteration
Thomas Banker, Nathan P. Lawrence, and Ali Mesbah

TL;DR
Soft MPCritic combines reinforcement learning and model predictive control by learning in value space and using sample-based planning, enabling scalable, robust control for complex tasks.
Contribution
It introduces an amortized warm-start strategy and integrates MPC with value iteration, making RL-MPC hybrid control computationally practical and effective.
Findings
Achieves effective control on classic and complex tasks.
Enables scalable RL-MPC with short-horizon planning.
Demonstrates practical synthesis of MPC policies in challenging settings.
Abstract
Reinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
