Reward-Consistent Dynamics Models are Strongly Generalizable for Offline   Reinforcement Learning

Fan-Ming Luo; Tian Xu; Xingchen Cao; Yang Yu

arXiv:2310.05422·cs.LG·October 10, 2023

Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning

Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu

PDF

Open Access

TL;DR

This paper introduces reward-consistent dynamics models that improve generalization in offline reinforcement learning by leveraging a dynamics reward function, leading to significant performance gains across multiple benchmarks.

Contribution

The paper proposes reward-consistent dynamics models and the MOREC method, which enhances offline RL by filtering transitions based on a learned dynamics reward, improving generalization.

Findings

01

MOREC outperforms previous methods on D4RL and NeoRL benchmarks.

02

MOREC achieves above 95% online RL performance in multiple tasks.

03

The method demonstrates strong generalization ability, recovering unseen transitions.

Abstract

Learning a precise dynamics model can be crucial for offline reinforcement learning, which, unfortunately, has been found to be quite challenging. Dynamics models that are learned by fitting historical transitions often struggle to generalize to unseen transitions. In this study, we identify a hidden but pivotal factor termed dynamics reward that remains consistent across transitions, offering a pathway to better generalization. Therefore, we propose the idea of reward-consistent dynamics models: any trajectory generated by the dynamics model should maximize the dynamics reward derived from the data. We implement this idea as the MOREC (Model-based Offline reinforcement learning with Reward Consistency) method, which can be seamlessly integrated into previous offline model-based reinforcement learning (MBRL) methods. MOREC learns a generalizable dynamics reward function from offline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Muscle activation and electromyography studies