Objective Mismatch in Model-based Reinforcement Learning

Nathan Lambert; Brandon Amos; Omry Yadan; Roberto Calandra

arXiv:2002.04523·cs.LG·April 20, 2021·21 cites

Objective Mismatch in Model-based Reinforcement Learning

Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra

PDF

Open Access 2 Repos

TL;DR

This paper identifies a fundamental objective mismatch in model-based reinforcement learning, where optimizing the dynamics model for likelihood does not always improve control performance, highlighting a key limitation and proposing a mitigation approach.

Contribution

The paper characterizes the objective mismatch in MBRL, demonstrating its impact and proposing a re-weighting method to mitigate this issue.

Findings

01

Likelihood of one-step ahead predictions is not always correlated with control performance

02

Dynamics models effective for specific tasks may not need to be globally accurate

03

Global accuracy does not guarantee good control performance

Abstract

Model-based reinforcement learning (MBRL) has been shown to be a powerful framework for data-efficiently learning control of continuous tasks. Recent work in MBRL has mostly focused on using more advanced function approximators and planning schemes, with little development of the general framework. In this paper, we identify a fundamental issue of the standard MBRL framework -- what we call the objective mismatch issue. Objective mismatch arises when one objective is optimized in the hope that a second, often uncorrelated, metric will also be optimized. In the context of MBRL, we characterize the objective mismatch between training the forward dynamics model w.r.t.~the likelihood of the one-step ahead prediction, and the overall goal of improving performance on a downstream control task. For example, this issue can emerge with the realization that dynamics models effective for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Formal Methods in Verification