EfficientTDMPC: Improved MPC Objectives for Sample-Efficient Continuous Control
Thomas Evers, Cristian Meo, Wendelin Bohmer, Justin Dauwels, Yaniv Oren

TL;DR
EfficientTDMPC is a novel model-based reinforcement learning method that enhances sample efficiency in continuous control by reducing return estimate errors through ensemble modeling and uncertainty penalties.
Contribution
It introduces ensemble dynamics models and an uncertainty penalty to improve the accuracy of return estimates in TD-MPC algorithms.
Findings
Achieves state-of-the-art sample efficiency on HumanoidBench-Hard and DMC hard benchmarks.
Matches state-of-the-art performance on DMC easy benchmark.
Benefits from higher update-to-data ratios, further improving sample efficiency.
Abstract
We introduce EfficientTDMPC, a sample-efficient model-based reinforcement learning method for continuous control built on the TD-MPC family of algorithms. Central to this family is a planner that aims to find an action sequence that maximizes the estimated return. The return is estimated using a learned model and value networks, each of which can introduce error. EfficientTDMPC proposes to reduce this error in two ways. First, it introduces an ensemble of dynamics models and averages the return estimates across those models and across different rollout depths. Second, it adds the option to apply an uncertainty penalty to the planner objective, yielding a planner that avoids actions with uncertain return estimates. It then adds practical improvements which increase buffer data freshness and reduce compute. Lastly, we find that our contributions enable EfficientTDMPC to benefit more from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
