Self-Correcting Models for Model-Based Reinforcement Learning

Erik Talvitie

arXiv:1612.06018·cs.LG·July 28, 2017·22 cites

Self-Correcting Models for Model-Based Reinforcement Learning

Erik Talvitie

PDF

Open Access 1 Repo

TL;DR

This paper introduces a theoretical analysis and a new algorithm for self-correcting models in model-based reinforcement learning, demonstrating improved robustness to model inaccuracies and providing performance guarantees.

Contribution

It provides a theoretical framework for self-correcting models in MBRL and proposes an algorithm with robustness guarantees for deterministic MDPs.

Findings

01

Self-correcting models improve MBRL performance with flawed models.

02

A novel error bound relates self-correction ability to MBRL success.

03

The proposed algorithm offers performance guarantees despite model class limitations.

Abstract

When an agent cannot represent a perfectly accurate model of its environment's dynamics, model-based reinforcement learning (MBRL) can fail catastrophically. Planning involves composing the predictions of the model; when flawed predictions are composed, even minor errors can compound and render the model useless for planning. Hallucinated Replay (Talvitie 2014) trains the model to "correct" itself when it produces errors, substantially improving MBRL with flawed models. This paper theoretically analyzes this approach, illuminates settings in which it is likely to be effective or ineffective, and presents a novel error bound, showing that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error. These results inspire an MBRL algorithm for deterministic MDPs with performance guarantees that are robust to model class limitations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

etalvitie/hdaggermc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics