Minimax Model Learning

Cameron Voloshin; Nan Jiang; Yisong Yue

arXiv:2103.02084·cs.LG·March 4, 2021

Minimax Model Learning

Cameron Voloshin, Nan Jiang, Yisong Yue

PDF

Open Access

TL;DR

This paper introduces a new off-policy loss function for transition model learning in reinforcement learning, enhancing robustness against distribution shifts and model misspecification, with theoretical and empirical validation.

Contribution

It proposes a novel off-policy loss derived from policy evaluation objectives, improving robustness and integration with off-policy optimization techniques.

Findings

01

Empirical improvements over existing off-policy evaluation methods

02

Theoretical analysis supports robustness benefits

03

Loss function effective for off-policy optimization

Abstract

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Fuel Cells and Related Materials