Objective Mismatch in Model-based Reinforcement Learning
Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra

TL;DR
This paper identifies a fundamental objective mismatch in model-based reinforcement learning, where optimizing the dynamics model for likelihood does not always improve control performance, highlighting a key limitation and proposing a mitigation approach.
Contribution
The paper characterizes the objective mismatch in MBRL, demonstrating its impact and proposing a re-weighting method to mitigate this issue.
Findings
Likelihood of one-step ahead predictions is not always correlated with control performance
Dynamics models effective for specific tasks may not need to be globally accurate
Global accuracy does not guarantee good control performance
Abstract
Model-based reinforcement learning (MBRL) has been shown to be a powerful framework for data-efficiently learning control of continuous tasks. Recent work in MBRL has mostly focused on using more advanced function approximators and planning schemes, with little development of the general framework. In this paper, we identify a fundamental issue of the standard MBRL framework -- what we call the objective mismatch issue. Objective mismatch arises when one objective is optimized in the hope that a second, often uncorrelated, metric will also be optimized. In the context of MBRL, we characterize the objective mismatch between training the forward dynamics model w.r.t.~the likelihood of the one-step ahead prediction, and the overall goal of improving performance on a downstream control task. For example, this issue can emerge with the realization that dynamics models effective for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Formal Methods in Verification
